Introduction

Some groups of mussels are capable to produce proteinaceous glue- like sticky material known as byssus thread made by an array of foot proteins (fps). This byssus contains mainly four parts i.e. Plaque, thread, stem, and root. Individual threads proximally merged together to form stem and base of the stem (root) deeply anchored at the base of animal foot. Each byssus threads terminating distally with a flattened plaque which mediates adhesion to the substratum1,2,3,4. Each part of the byssus thread complex formed by the auto-assembly of secretory products originating from four distinct glands enclosed in the mussel foot4,5. These mussel foot protein (Mfps), mastered the ability to binding the diverse substratum by using adhesive plaques. 3,4- Dihydroxy phenylalanine (DOPA), is the core constituents in the Mfps, is formed by the post-translational hydroxylation of tyrosine. During the post- translational modification, polyphenol oxidases catalysis the o-hydroxylation of monophenols (tyrosine) to o-diphenols (DOPA), and the adhesion ability of Mfps are strongly correlated with the amount of DOPA2,5,6,7.

The byssal threads are engineered to withstand elevated mechanical loads applied by waves and currents in subtidal and intertidal zones.4,8. In recent decades, there has been significant understanding of Bivalvia origin, diversities and Mfps. The magnificent moisture-resistant adhesive property of Mfps has inspired to the development of a wide variety of functional materials2,4,9. Designing of mussel-mimetic adhesive materials, initially we need to understand the specific physio-chemical and functional property of each Mfps. This works aims to divulge the physio-chemical structural and functional characterization of currently available all Mfps of various species. And also disclose the evolutionary diversification and molecular clock level speciation of byssus thread producing bivalves and Mfps. The structural modeling and functional analysis of Mfps helps to understanding the which Mfps is highly promising for specific industrial and therapeutical applications.

Results and Discussion

Distribution frequency of available Mfps

A total of 78 Mfps are available in NCBI protein bank. Among these, 34 Mfps in Mytilus californianus Conrad,1837 (Mytilida: Mytilidae), 26 in Mytilus unguiculatus Valenciennes,1858 (Mytilida: Mytilidae) (Synonym of M.coruscus), five in Perna viridis (Linnaeus,1758) (Mytilida: Mytilidae), four in Perna canaliculus (Gmelin,1791) (Mytilida: Mytilidae), three in Mytilus galloprovincialis Lamarck,1819 (Mytilida: Mytilidae), two in Mytilus edulis Linnaeus,1758 (Mytilida: Mytilidae) and Mizuhopecten yessoensis (Jay,1857) (Pectinida: Pectinidae) and one in Atrina pectinata (Linnaeus,1767) (Pteriida: Pinnindae) and Dreissena polymorpha (Pallas,1771) (Myida: Dreissenidae). Scientific names of selected Bivalvia species were validated in Catalogue of Life:2019 Annual Checklist (http://www.catalogueoflife.org/annual- checklist/2019) and in World Register of Marine Species (WORMS) (http://www.marinespecies. org/index.php).

Molecular modeling of Mfps

Structural information of Mfps is not available in PDB (Protein Data Bank). The complete structure of each Mfps is mandatory for analyzing their structural and functional aspects. Comparative homology modeling of Mfps was done by using the MUSTER server10. The best model template was selected to develop a full protein model (Table 1: Template used for each Mfps modelling). All protein models are visualized using PyMol tool and EzMol 2.1. (Fig.1 and Supplementary Data S2- Table 2).

Table 1 List of Mfps used for In silico analysis with results of Ramachandran Plot analysis (Generated in PDBsum and PROCHECK).
Figure 1
figure 1

Ribbon diagram of the three-dimensional structure of mussel foot proteins (Mfps), visualized in EzMol 2.1.

Validation of Mfps model

In a good protein model, is expected that there should be more than 90% of the residue in the core or favored region and additional allowed regions11. By analyzing the Ramachandran plot, among the 78 Mfps models, 67 protein models are highly stable because of the 90% residue occurred in core and additional allowed regions of Ramachandran plot. The other 11 protein models are moderately stable because the 85–90 % of residue occurred in core and additional allowed regions of Ramachandran plot (Table 1 and Fig. 2).

Figure 2
figure 2

Ramachandran plot of mussel foot proteins (Mfps), generated in PROCHECK, PDBSum.

Promotif documentation of Mfps

The simulated Mfps models were further analyzed using PDBsum for the promotif documentation. In PDBsum server12, analyze the secondary structure characterization of Mfps likes, sheets, beta-alpha beta units, beta hairpins, Psi loops, strands, helices, helix-helix interactions, beta turns, gamma turns and disulfides. The disulfide bond only present in Mcfp2, Mcfp6 v1, Mcfp6 v3, Mcfp14, Mcfp15, Mcfp16, Mcfp18, Apfp1, Mefp2, Mufp2, Mufp6 v9, Pvfp3, Pvfp5 and Pvfp6. The disulfide bond can be formed under oxidizing conditions and play an important role in the folding and stability of the extracellular proteins5,13,14,15. Normally the disulfide bonds are the crosslinking groups that contribute strength of the protein. All Mfps contained beta turns in varying numbers, but the psi loops present in Mcfp11 only. By analyzing the promotif of Mfps indicated as most of the Mfps exhibited the moderate structural complexity because of the limited number of secondary structural modifications (Table 2).

Table 2 Promotif documentation of all Mfps, generated in PDBsum server.

Signal peptide predictions of Mfps

To verify the signal peptide in Mfps, Phobius and SignaIP 5.0 server were used. Mizuhopecten yessoensis foot protein, Myfp1 V1 and V2 don’t contain any signal peptide region. Except Myfp, all other proteins have the signal peptide regions (~1–20 amino acid sequences), the efficiency of protein secretion in extracellular region is highly determined by the signal peptide and also the signal peptide are extremely heterogenous in nature16 (Table 3).

Table 3 Physio-chemical characterization of Mfps (Generated in Expasy protparam) with signal peptide prediction (Generated in Phobius and SignaIP 5.0).

Accessible surface area (ASA) of Mfps

ASA of each Mfps is extremely unique because the size of the fps varies from each species. In the wet adhesion, the hydrophobic nature is very important5, so that perspectives the percentage of side ASA hydrophobic analysis of all Mfps revealed. Among the all Mfps, Pvfp1 v1 is showed the highest percentage of side ASA hydrophobic and followed by Pvfp1 v2 (Table 4).

Table 4 Accessible surface area of Mfps generated in VADAR server. (ND: Not detected).

Functional characterization of Mfps

Functional characterization of Mfps, FFPred 3 server analyze the protein in three different categories like biological process prediction, cellular component prediction and molecular function prediction with GO (Gene Ontology) term17. This is the first attempt to revealing the molecular function, cellular process and biological activity of Mfps. By analyzing the biological process of Mfps, most of the Mfps show the cell surface receptor signaling pathway (GO:0007166) and cellular component prediction of Mfps shows that all Mfps is in the extracellular region (GO:0005576). The functional characterization of each Mfps showed unique functions. The wet adhesion is the core feature of all known Mfps5,9, other than this property the protein showed some different features also. G-protein coupled receptor activity (GO:0004930), is the common molecular function of all Mfps. (Supplementary data S3-Tables 3, 4 and 5).

Mytilus californianus foot protein (Mcfp), the fp1 showed the growth factor activity (GO:0008083). The mcfp2, exhibit the nine molecular function, and this protein also showed the zinc ion binding activity (GO:0008270) and endopeptidase activity (GO:0004175). In Mcfp3, contained 11 variants, all protein variants showed the receptor activity (GO:0004872), G-protein coupled receptor activity (GO:0004930) and Peptidase inhibitor activity (GO:0030414). Except for Mcfp3 v1, all other variants of Mcfp3 showed the enzyme inhibitory activity (GO:0004857). Purine nucleoside binding (GO:0001883) and catalytic activity (GO:0003824) are showed by Mcfp4 v1 and v2 respectively. Mcfp6 v2, Mcfp11, Mcfp15, Mcfp16 and Mcfp18 exhibit the zinc ion binding (GO:0008270) activity and Mcfp9 showed the co-factor binding activity (GO:0048037).

Atrina pectinata foot protein (Apfp1) showed the growth factor (GO:0008083) and cytokine activity (GO:0005125). Dreissena polymorpha foot protein (Dpfp1) showed the growth factor and G- protein-coupled receptor activity. In Mytilus edulis foot protein (Mefp), Mefp2, exhibit the zinc ion binding (GO:0008270) activity. Mefp2 showed the highest number of molecular functional activities comparing to the Mefp1. In Mytilus galloprovincialis foot protein (Mgfp), all proteins and their variant revealed the G-protein coupled receptor binding activity and Mgfp3 v1 and v2 showed the peptidase inhibitory activity.

The Mytilus unguiculatus foot protein (Mufp), contains three types of protein and their variant. Mufp2 showed the transmembrane signaling receptor activity, endopeptidase activity, signal transducer activity, serine hydrolase activity, cytokine, and zinc ion binding activity. In Mufp3 and their variants exhibit the peptidase inhibitory activity and G-protein coupled receptor activity. All proteins and their variables in Mufp6 showed the cytokine activity. Mufp6 and their variant like v3 and v9 exhibit the zinc ion binding activity. Except for Mufp6, all variants showed the cytokine receptor binding activity.

Mizuhopecten yessoensis foot protein (Myfp), showed the DNA binding (GO:0003677), cytoskeleton protein binding (GO:0008092) and nucleic acid binding (GO:0003676) activities. The Myfp1 v2 showed the sequence-specific DNA binding transcription factor activity. The Perna canaliculus foot protein (Pcfp), the fp1 have four variants and all variants showed the poly (A) RNA binding (GO:0044822) except Pcfp1 v1. Perna viridis foot protein (Pvfp), each protein has unique molecular functions. The Pvfp1 v1 and Pvfp6 showed the G-protein coupled receptor binding activity. Glycosaminoglycan binding (GO:0005539) activity observed in Pvfp1 v2. The variants like fp3 and fp5 exhibit the zinc ion binding activity.

Chemical structural evaluation of Mfps

Amino acid compositional analysis of Apfp1, the major amino acid composition is lysine (15.3%) and proline (15.1%). Most of the amino acid in neutral charge and with positive charge clusters from 121 to 146, (KKPPVYKPKKPVYKPKKRPAYKPKKK), mixed and negative charge clusters are absent in Apfp1. Core block tandem repeats like PPVD, KPPV and PDYKP repeated two times and YKPKK repeated three times. Dpfp1 showed the highest abundance of proline (22.3%) and tyrosine (14.9%), the charge cluster analysis revealed the absence of positive, negative and mixed charge clusters in Dpfp1. tandem repeated blocks, FTTK, PVYPT, PVYPY, PVYPP, PEYP repeated two times and a four-times repetition of PVYP are also observed.

Mcfp1 v1- 23.6 % of amino acid contributed by proline and followed by tyrosine (18.7%). And the absence of specific charge clusters like positive, negative and mixed. Interestingly the presence of 66 copy repetition of YK.K...YPP. the element from location 82–741. Comparing to the first variants of Mcfp1, the v2 was identified as 23.2% of proline and followed by lysine (20.1%) and tyrosine (18.6%). Same as the v1, v2 doesn′t contain any charge clusters. The KKSYPPAYK tandemly repeated four times and also 60 copies of YK.K...YPP. periodically present in-between location 82–681 of Mcfp1 v2. Mcfp2 contained the highest amount of cysteine (14.4%) and followed by glycine (13.5%). Absence of charge clusters and PCKN tandemly repeated five times. In Mcfp3 contained 11 variants, the highest abundance of glycine (18.2%) present in first variants (v1) and followed by tyrosine (15.2%). YPRG repeated two times and without any charge clusters. Glycine (17.6%), tyrosine (14.7%) is the abundant amino-acid present in v2, and this variant doesn′t contain any charge clusters. GWNK is the only tandem repeats present in v2, and it repeated by two times. The highest abundant amino acid in v3 is glycine (20.5%) and tyrosine (17.9%). This variant doesn′t contain any tandem repeated blocks and charge clusters. Tyrosine composition in each variant like v4, v5, v6, v7, v8, v9, v10 and v11 is 19.2, 19.2, 16.2, 18.7, 17.4, 15.9, 15.9, 15.9% respectively. And these variants don′t show any tandem repeated regions and charge clusters. Mcfp4 contained two variants, tyrosine content same in both variants (2.2%). Histidine is the most abundant amino acid in both variants, v1 (23.0%) and v2 (23.8%). Negative charge cluster region in v1, sequence from 486–536 and 565–628 (DLSNDLHPDNNIEQIANDHVNDIAQSTDGDINDFADTHYNDVA PIADVHVD) but in v2, the negative cluster range from 526–576 and 599–668 (DLSNDLHPDNN IEQIANDHVNDIAQSTDGDINDFADTHYNDVAPIADVHVD). HRHVH is the tandemly repeated six times in v1 and HVHRH tandemly repeated three times in v2 and also 39 copies of H.HVH.H.VL periodic element present in between 50–439. Tyrosine (20.8%) is abundantly present in Mcfp5 and also the presence of positive charge clusters from 53–89 (KGKYYGKGKKYYY KYKRTGKYKYLKKARKYHRKGYKK). In Mcfp6, 3 variants are currently reported. The abundancy of tyrosine in each variant is v1(18.2%), v2 (16.5%) and v3(15.5%), and the three variants don′t contain any charge clusters. Mcfp7 v1 and v2 are almost same amino acid compositions, and also showed the same chemical structural characterizations. The tyrosine amount is 10.2% in v1 and 9.2% in v2. Glycine (27.5%) is the most abundant amino acid present in Mcfp8 and without any charge clusters and specified tandem repeats. Glycine and histidine are the abundant amino acid in both variants of Mcfp9. GGHH repeated four times in v2 and two times in v1. By analyzing the amino acid composition in Mcfp10–18 variants, each variant exhibits the unique amino acid characterization. Mcfp11 contained the mixed charge clusters from 357–386 (ENQHKRHL REREYQNKRHLSNEEHLHNKHE), positive charge clusters in Mcfp12 (231–257, RFRRFKIRHGR FRYGGKYYKLSCNKRR) and other variants doesn′t exist any charge clusters.

Two fps reported in Mefp, comparing both variants tyrosine abundantly present in Mefp1 (18.2%) and Mefp2 (7.3%). The tandem repeat distribution in fp1 is PVYKP (two times), YKPKI (four times) and in fp2, GKTGYKC (two times), KPNPC (seven times), NACKPN (five times), VCSPNP (five times) and KPNPC (three times). And these variants don′t contain any charge clusters.

The three-foot proteins reported in Mgfp (Mgfp1, Mgfp3 v1, and v2). Tyrosine abundance same in both variants of fp3 (14.3%) and in fp1 is 19.0%. Absence of tandem repeats in the fp3 v1 and v2, tandem repeats in fp1 is TYKPKPSYPATYKSKSSY (three times) and TYKPKPSYPAT YKSKSSYPSSYKPKKTY (three times). The charge clusters are absent in Mgfps.

In Mufp, major foot protein is fp2, fp3 (15 variants) and fp6 (ten variants). Amino acid composition in fp2, lysine is the most abundant (14.1%) and followed by cysteine (13.4%). This protein contained negative charge clusters from 18–37 (TAPTTQYDDDEDDYKPDTAY) and tandem repeats are KPNPC (4 times). In fp3, glycine (20.8%) is the most abundant amino acid and followed by tyrosine (18.2%) and this protein doesn't contain any specified charge clusters and tandem repeated blocks. The variants of fp3 (v1 to v14), glycine is the most abundant amino acid present in each variants v1 (20.5%), v2 (20.5%), v3 (15.2%), v4 (20.5%), v5 (20.5%), v6 (19.5%), v7 (20.5%), v8 (21.8%), v9 (20.5%), v10 (20.5%), v11 (20.0%), v12 (20.5%), v13 (22.5%) and v14 (20.8%). The tyrosine composition in each variants is v1 (19.2%), v2 (19.2%), v3 (13.6%), v4 (19.2%), v5 (17.9%), v6 (19.5%), v7 (16.7%), v8 (14.1%), v9 (17.9%), v10 (12.8%), v11 (16.2%), v12 (17.9%), v13 (7.5%) and v14 (15.3%). These variants don't contain any specified charge clusters and tandem repeated blocks. Interestingly tyrosine (19.7%) abundantly observed in fp6 and followed by glycine (12.3%). Specified charge cluster absent in this protein and the tandem repeated sequences is NCNSYAGCCL (repeated 2 times) and YCTNKGC (2 times). Fp6 has 9 variants (v1 to v9), tyrosine is the most abundant amino acids in each variant in order to v1 (19.5%), v2 (21.1%), v3 (19.5%), v4 (20.3%), v5 (20.3%), v6 (19.5%), v7 (19.6%), v8 (21.6%) and v9 (20.8%). Perfectly matched tandem repeated blocks present in variants like v2 – RGYC (two times) and v5 – RGYC (two times). And charged cluster is absent in all variants of fp6.

Two variants present in Myfp1, threonine (37.4%) is most abundantly present in v1 and in the case of v2 is glycine (26.9%). The tyrosine concentration in each variant is v1 (0.4%) and v2 (4.0%). Only v2 contained the mixed charge clusters from 2–24 (DAGFEALKKIIVRMDETERY KRR). The specified tandem repeated blocks in v1 is TSQTDT (nine times), TDTTQN (ten times), TDTTQNT TSQ (five times), QNTTSQ (eight times), NTTSQT (nine times), TSQTDTR (three times), TSQTDTT (five times), RQNTTP (nine times), DITQN (two times) and TSQTDTK (two times). In the case of v2, YGLG (seven times), YGLGQSPG (six times), YGLGQSPGTGYWLGQ SPGTG (four times) and YGLGQSPGTGYWLGQSPGTGYGLGQSPGTVYGLGQSPGTGYWL G (three times).

Under the Pcfp1, currently four variants are reported (v1 to v4). The most abundant amino acid in v1 is lysine (23.4%), and in other v2 (lysine-23.4%), v3 (lysine-24.3%), v4 (24.2%). The tyrosine abundancy in each variant is v1 (20.2%), v2 (20.1%), v3 (21.0%) and v4 (20.9%). Tandem repeated elements in v1- KPYV (88 times), v2- KPYV (87 times), v3- KPYV (91 times) and v4 – KPYV (87 times). And this variant doesn't contain any charge clusters.

In Pvfp, currently available major foot protein is fp1, fp3, fp5, and fp6. Two variants present in fp1, proline (19.6%) is abundantly present in v1 and followed by alanine (18.7%) and in v2- proline (19.5%) and then alanine (18.6%). The tyrosine level in v1 is 1.4% and in v2 is 1.6%. The tandem repeated blocks in v1 are HPPSWTAWIA (4 times), WTAWKAHPPAWTAWK (5 times), PPPAWTAWK (8 times), GKPGKPG (3 times) and PPPAWTAWKATLKPWTAWKATPKPW TAWKATPKPWTAWKATPKPWTAWK (3 times). PPPAWTAWK (9 times), GKPG (4 times) and PPAWTAWKATPKPWTAWKAP (4 times) are the tandem repeated block present in v2. Both variants don't contain any charge cluster regions. In fp3, cysteine (15.7%) is the most abundantly present amino acid and followed by phenylalanine (8.6%). 4.3% of tyrosine present in fp3 and absence of specified charge cluster regions and tandem repeated blocks. Cysteine (16.5%) is the most abundant amino acid in the fp5 and tyrosine abundancy is 14.2%. The tandem repeated sequence is GYYGKNCQ (2 times), TCKC (2 times) and CLNGG (2 times) and without any charge cluster regions. Cysteine (13.9%) is the most abundant amino acid present in fp6 and followed by glycine (10.7%) and also the presence of 6.6% of tyrosine. Fp6 without any specified tandem repeated blocks and charge clusters.

The amino acid composition of each Mfps, glycine, and lysine are the major component of the Mfps other than tyrosine (Detailed data in Supplementary File S4). Recently discovered as multiple pairs of Dopa-lysine contribute to the critical underwater adhesion5,6,14,18. The polymorphism in Mfps, may indicated as the versatility of adhesion as variety forms of an adhesive protein can interact the various surfaces.

Physiochemical characterization of Mfps

Expasy protparam server revealed the physio-chemical properties of each protein (Table 3 and Fig. 3). This server helps to the grouping of Mfps, the all Mfps half-life more than 10 hours in E.coli and >20 hours in yeast. In Mcfp1 v1 and v2 have the same isoelectric point 10.04, molecular weight in v1 (85024.22 D) and in v2 (78048.08 D), and other characteristic features like Instability Index (II), AI (Aliphatic Index), Extinction Coefficient (EC) and Grand Average of Hydropathicities (GRAVY) of first variant is 41.20, 26.57, 204255 and −1.357. And the second variant characters are 40.93 (II), 28.07 (AI), 186375 (EC) and −1.329 (GRAVY). Mcfp2 physiochemical properties are 9.04 (pI), 46753.52 D (Mw), 14.42 (II), 27.80 (AI), EC (58545) and −0.890 (GRAVY). The 11 variants of fp3, the highest isoelectric point was observed in v1 (10.09) and followed by v2 (10.05) and the lowest pI in v7 (7.88). The other protein in Mcfp, all variants of fp4, fp7 and fp13 the pI value is higher than 10.00. Most of the fps, II values are below 40 except the variants of fp1, fp4, fp6, fp12, fp13, fp14, fp15 and fp18. By analyzing the hydropathicity values (GRAVY analysis), all fps in Mcfps showed the non-polar nature.

Figure 3
figure 3

Hydrophobic contour map of mussel foot proteins (Mfps), The color code indicated as Red: High/Positive, Light color: Neutral and Blue: low/negative hydrophobicity. Generated in EzMol 2.1.

Comparing the fp1 and fp2 of Mefp, only some slight difference is observed. The pI and AI of fp1 and fp2 almost similar. But the fp2 (54459.22 D) is higher molecular weight protein comparing to fp1 (6467.22 D). The hydropathicity nature of fp1 (−1.280) is higher than fp2 (−0.896).

The Mufp is the highly polymorphic group, the highest pI value observed in Mufp3 v3 (10.33) and followed by fp3 v14 (10.05). among the 26 Mfps, 15 fps molecular weight is below 10000 D and II values of fp3 v3 (42.06), fp6 v3 (40.29), fp6 v4 (40.58) and fp6 v8 (40.19) is moderately stable because the instability index is above 40. The highest hydropathicity observed in fp6 v9 (−0.938) and followed by fp3 v3 (−0.889) and fp2 (−0.842), and lowest GRAVY in fp3 v10 (−0.110).

Apfp1, Dpfp1 foot proteins are least polymorphic and less explored groups, the pI value in Apfp1 is 9.47 and pI of Dpfp1 is 5.24. In Apfp1 physiochemical features are Mw is 40382.27 D, II (33.47), AI (57.76), EC (121185) and GRAVY (−0.662). 49361.70 D molecular weighted Dpfp1, II is 53.65, AI (31.40), EC (122860) and GRAVY (−1.331). Myfp1 contained two variants (v1 and v2), but entirely different physiochemical characteristic features. The physiochemical features of v1, pI (5.12), Mw (56961.06 D), II (23.62), AI (6.97), EC (8480) and GRAVY (2.038). pI (8.66), Mw (33579.64 D),II (62.60), AI (56.48), EC (118370) and GRAVY (−0.615) is the physio-chemical feature of v2.

Comparing the all Mfps, instability index of four variants of Pcfp1 is negative (v1= −10.03, v2= −9.97, v3 = −10.61 and v4 = −10.38) and the AI is high (v1 = 65.96, v2 = 65.90, v3 = 66.83 and v4 = 66.60). Molecular weight is higher in v1 (51565.91) and lower in v4 (46301.65). Both v1 and v2 share the same hydropathicity nature and also in the case of v3 and v4. In Pvfp, the 2 variants of fp1 (v1 and v2) share the same physiochemical natures. Comparing the other fps, fp3 (0.260) and fp6 (0.102) is polar nature protein because the GRAVY value is positive in nature. Except for fp3 (44.26), other variants are highly stable because of the II is below 40.

The physiochemical structural and functional characterization of all Mfps is the first time. All protein is hydrophobic nature but except the Pvfp3 and Pvfp6 is polar but it is hydrophobic nature. In Mytilus sp. Mfp-3f is polar but hydrophobic nature, the protein may play vital role in metal and mineral surface adhesion7,18,19.

Ion ligand binding sites of Mfps

Understanding the general properties of the ligand-binding ability of the protein sites is the great importance to understand the functional diversity of the Mfps. One of the fundamental features of the Mfps receptor surface is the set of amino acids available for interactions with ligands. The stabilization and interlinking of Mfps mainly mediated by metal ions, by divulging the metal and acid radical ion binding ability helps to understanding the functional diversity of Mfps. (Detailed predicted binding residues (s) of each Mfps provided in Supplementary File S5).

Foot protein: Apfp1, mainly 3 ligand binding sites were identified in this protein i.e., Zn2+, Ca2+ and Na+. Among these ligand binding sites; 67 sites are available for zinc-binding and followed by 22 sites for sodium binding and three amino acid sites for calcium (I249 E255 E341). Except for calcium, the zinc and sodium bind to the tyrosine and they may associate with DOPA to help them interlinking. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−).

Foot protein: Dpfp1, the four metal ions binding sites were identified in this foot protein (Zn2+, Ca2+, Na+ and K+). The binding sites of each metal ions are, Zn2+ binding site contained 44 sites and followed by Ca2+ binding site (ten sites): P174 N243 D267 K269 D289 D293 G316 P317 P402 Y403, Na+ binding site: C8 and K+ binding site (three binding sites): D267 P272 I276. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, CO32−, NO2, SO42−, PO43−).

Foot proteins: Mcfps, the ligand-binding sites of each Mfps and their variants showed extremely unique features. In Mcfp1 v1, 145 binding sites were predicted for Zn2+, among these sites most of the zinc ions prefer the histidine amino acid for binding. Only two binding sites were available in Ca2+ (D458 E461). Approximately 292 sites for Na+ metal ion binding are present in this protein, comparing the other metal-ligand ions 2/3rd portion of the amino acid in the protein capable to bind the Na+. The Na+ mostly prefer the histidine for the binding. The K+ metal ion binding sites are, M75 S79 I82 M83 H86 L99 H102 V103 V108. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, CO32−, NO2, SO42−, PO43−). Mcfp1 v2, approximately 269 binding sites of Zn2+ is present, the histidine and tyrosine are the most common binding sites of Zn2+ metal ion. The other metal-binding sites are, Cu2+ binding site: H150 H152, Ca2+ binding site: D558 D561, Mn2+ binding site: H212 Q257 H260 H262, K+ binding site: H50 S51 and in Na+, approximately 186 metal ions sites were detected. No binding site detected for the following ions:(Fe2+, Fe3+, Mg2+, CO32−, NO2, SO42−, PO43−). Comparing the two variants of Mcfp1, the v2 contained different ligand binding sites such as Zn2+, Cu2+, Ca2+, Mn2+, K+ and Na+. The metal ion Na+ is mostly preferring the v1 and the Zn2+ prefers the v2.

Mcfp2: in the foot protein only two metal ion binding sites are present and also one acid radical ion sites also detected. Comparing the Zn2+ and Na+ metal ion binding sites, Zn2+ (~221 sites) is widely distributed or binding the most of the regions and followed by Na+ (~103 sites) and Ca2+ (97 binding sites). The Zn2+, Na+ and Ca2+ randomly bind the different amino acids present in the protein. The acid radical ion, SO2− binding site: K206 R220 P221. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, PO43−).

The Mcfp3 contained 11 variants, each variant contained unique ligand binding sites were identified. In v1, mainly three metal ion binding sites and two acid radical ions were detected. The metal ion binding sites are, Zn2+ binding site (11 sites): Q21 D23 Y28 Y38 K39 N43 Y45 R47 Y50 W56 W61, Ca2+ binding site (Seven sites): L12 I15 G26 N27 G48 Y50 G51, Mg2+ binding site (two sites): R60 W61 and the acid radical binding sites are, CO32− binding site (five sites): L12 V13 I15 R63, SO42− binding site (11 sites): D23 R47 W52 W56 K57 K58 G59 R60 W61 K64 Y65. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mn2+, Na+, K+, NO2, PO43−). Five different metal ions are ability to bind to the v2 (Zn2+, Ca2+, Mg2+, Na+, and K+). The metal ion binding sites of each metals are, Zn2+ binding site (14 sites): Q21 D23 K28 Y34 Y38 G39 Y42 Y48 R50 Y52 W54 K56 W58 W63, Ca2+ binding site (three site): G16 S22 G61, Mg2+ binding site (two sites): Y27 K41, Na+ binding site: Y48, K+ binding site (fives sites): K2 S3 S5 I6 L9, No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mn2+, CO32−, NO2, SO42−, PO43−). In v3, metal ion ligand binding sites are, Zn2+ binding site (23 sites): K3 Q21 D23 Y26 Y28 Y38 N39 Y42 Y45 N46 G47 Y48 Y51 H52 Y55 G56 W57 K59 W61 N62 W66 Y70 Y71, Ca2+ binding site (six sites): G56 W57 N58 G60 W61 N62, Na+ binding site (three sites): Y42 Y48 Y51. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). The ligand-binding sites of v4: Zn2+ binding site (19 sites): K3 Q21 D23 D27 Y28 Y38 N39 Y45 Y48 Y51 H52 Y55 G56 K59 W61 N62 W66 Y70 Y71, Ca2+ binding site (six sites): D23 D27 W57 N58 W61 G64, Na+ binding site (two sites): Y48 Y51, K+ binding site (two sites): D23 D27, and the acid radical PO3−4 binding site (two sites): N58 G64. No binding site detected for the following ions: (Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, CO32−, NO2, SO42−). v5, ligand binding sites are: Zn2+ binding site (20 sites): K3 Q21 D23 Y28 Y38 N39 Y43 Y45 Y48 Y51 H52 Y55 G56 W57 K59 W61 N62 W66 Y70 Y71, Ca2+ binding site (six sites): G56 W57 N58 G60 W61 N62 and Na+ binding site (two sites): Y48 Y51. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). v6, Zn2+ binding site (16 sites): K3 Q21 D23 Y28 Y44 G45 Y48 Y51 K52 Y54 R56 Y58 K62 W64 W68 W73, Ca2+ binding site (five sites): D29 G57 Y58 G59 N61, Mg2+ binding site (two sites): G46 K47, Na+ binding site: Y48 and acid radical ion, CO32− binding site (two sites): L30 Y32, SO42− binding site (six sites): Y28 N49 K52 G57 G71 R72, No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mn2+, K+, NO2, PO43−). v7, Zn2+ binding site (20 sites): K3 Q21 D23 Y26 Y28 D29 Y34 Y44 N45 Y48 Y51 Y54 Y57 H58 Y61 K65 W67 N68 N69 G70, Ca2+ binding site (14 sites): F4 S5 T7 D23 N40 P41 W42 G53 N55 G56 Y57 W63 N64 W67, Mn2+ binding site (two sites): N69 G70, Na+ binding site (four sites): Q21 Y54 Y57 N64, and acid radical CO32− binding site: Y32, No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, K+, NO2, SO42−, PO43−). v8, Zn2+ binding site (20 sites): K3 Q21 S22 D23 Y26 Y28 Y32 Y38 N39 Y43 Y45 Y48 Y51 H52 Y55 W57 K59 W61 N62 W66, Fe3+ binding site (three site): S22 D23 Y26, Mn2+ binding site (eight sites): L17 A19 V20 S22 D23 A24 Y26 Y28, Na+ binding site (three sites): Y42 Y48 Y51, K+ binding site (11 sites): I15 L17 F18 A19 V20 S22 D23 A24 Y26 Y28 Y32. No binding site detected for the following ions:(Cu2+, Fe2+, Ca2+, Mg2+, CO32−, NO2, SO42−, PO43−). v9, Zn2+ binding site (20 sites): K3 Q21 D23 Y26 Y28 Y38 N39 Y42 Y43 Y45 N46 Y48 Y51 H52 Y55 W57 K59 W61 N62 W66, Mg2+ binding site (two sites): D23 H52, Mn2+ binding site (three sites): L9 L12 V13, Na2+ binding site (two sites): Y48 Y51, and acid radical SO42− binding site (four sites): L12 V13 G16 Y38, PO43− binding site (four sites): F18 A19 V20 Y42, No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, K+, CO32−, NO2). v10, Zn2+ binding site (18 sites): K3 Q21 D23 Y26 Y28 Y38 N39 Y43 Y45 Y48 Y51 H52 Y55 W57 K59 W61 N62 W66, Ca2+ binding site (seven sites): V13 I15 N39 G44 N46 Y55 W57, Mg2+ binding site (two sites): D23 H52, Na+ binding site (three site): Y42 Y48 Y51, No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). v11, Zn2+ binding site (18 sites): K3 Q21 D23 Y28 Y38 N39 Y43 Y45 Y48 Y51 H52 Y55 G56 W57 K59 W61 N62 W66, Fe3+ binding site (two sites): D23 G27, Ca2+ binding site (two sites): D23 G27, Mg2+ binding site (two sites): D23 H52, Mn2+ binding site (two sites): D23 G27, Na+ binding site (five sites): Y42 Y43 Y45 Y48 Y51, No binding site detected for the following ions:(Cu2+, Fe2+, K+, CO32−, NO2, SO42−, PO43−). Among the variants of Mcfp3, only v8 and v11 have the Fe3+ binding sites.

Ligand binding analysis of Mcfp5, a totally three metal ions and one acid radical ion binding site were predicted. The Zn2+ binding site (28 sites): K2 C5 C18 D20 S23 D26 Y28 D30 Y32 Y33 N39 Y40 P41 G43 H45 G46 Y47 H48 G49 H50 Y52 K53 Y57 K59 H83 Y87 Y90 Y91, Mg2+ binding site:K85 G86, Na+ binding site: N39 S44 K53, and acid radical CO32− binding site: H48 G51 K53 Y56, No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mn2+, K+, NO2, SO42−, PO43−).

Mcfp6 contained three fps variants, the first variant v1contained 37 binding sites for Zn2+ and followed by Fe3+ binding site (three sites): S32 Y79 N113, Na+ binding site (18 sites): K34 C36 R37 G39 Y40 A64 C67 R75 P87 D88 F107 N108 C109 S111 Y112 N113 C115 C116, and acid radical SO42− binding site (six sites): S22 N45 C49 Y51 G52 S53. No binding site detected for the following ions:(Cu2+, Fe2+, Ca2+, Mg2+, Mn2+, K+, CO32−, NO2, PO43−). The v2, 35 binding sites for Zn2+ and followed by Na+ binding site (14 sites): C36 R37 G39 Y40 A64 C67 N75 P87 Y107 D108 C109 S111 Y112 N113 and acid radical ion SO42− binding site (six sites): F11 I13 T14 C17 G18 I19. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, K+, CO32−, NO2, PO43−). v3, 34 binding sites for Zn2+ and followed by Ca2+ binding site (two sites): Y79 Y99, Mg2+ binding site (three site): N71 S75 T81, Na2+ binding site (18 sites): K34 C36 R37 G39 Y40 A64 C67 S75 P87 F90 Y107 D108 C109 S111 Y112 N113 C115 C116. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). The first 18 amino acids of all residue showed similar ligand binding sites in the case of Zn2+.

Among the two variants of Mcfp7, showed similar ligand binding positions in the case of Zn2+. In v1, Zn2+ binding site (17 sites): Y28 R29 R30 Y32 K33 G34 S35 H36 S37 G39 G40 H41 H44 G45 H49 Y51 Y55, Ca2+ binding site (four sites): G34 S35 Y57 K58, Na+ binding site: S38 S42 Y51. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). v2, Zn2+ binding site (21 sites): K27 Y28 Y32 Y35 K36 G39 S40 H41 S42 G44 G45 H46 S47 G49 G50 H51 H54 G55 G56 K57 Y61, Mg2+ binding site (two sites): S43 G44, Na+ binding site: H51, and acid radical ion CO32− binding site: S42 G56. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mn2+, K+, NO2, SO42−, PO43−).

Mcfp8, Zn2+ binding site (18 sites): P21 Y25 Y28 K36 Y37 K39 Y41 Y44 Y48 R51 Y52 H53 G55 K56 Y57 K60 Y61 K64, Ca2+ binding site (three sites): G46 K47 G66, Mg2+ binding site (two sites): V22 Y25, Na+ binding site: Y41, No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mn2+, K+, CO32−, NO2, SO42−, PO43−).

The Zn2+, Na+, and K+ capable to bind the various amino acid residue of the two variants, the Mg2+ only present in the v1 and Ca2+ is present in v2 only. Mcfp9 v1, 54 binding sites for Zn2+ and followed by Mg2+ binding site (four sites): G35 H36 H108 H111, Na+ binding site (14 sites): D27 G32 K34 Y53 H54 V57 H60 V64 G65 H67 W76 G78 P79 A91 and K+ binding site (nine sites): F10 G21 D29 G35 H36 V37 L38 I41 I70. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mn2+, CO32−, NO2, SO42−, PO43−). In v2, 57 binding sites of Zn2+ is predicted, Ca2+ binding site (two sites): V45 H47, Na+ binding site (12 sites): Y26 D27 G32 K34 G35 H36 L38 H54 G71 P72 S73 G93, K+ binding site (eight sites): Y23 G25 V37 L38 I41 V45 V62 I63 and acid radical ion PO43− binding site (three sites): H78 H80 V85, No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, CO32−, NO2, SO42−). Zn2+ metal ion binding sites of two variants are almost the identical.

Mcfp10–17 doesn't contain any variants, each fps showed the unique spectacular metal ions and acid radical bindings. In v10, Zn2+ exhibit the 49 binding sites, Mg2+ binding site (eight sites): V45 S46 T169 D210 D220 D221 Y299 D300 and Na+ had 33 binding sites. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). The v11, 141 binding sites are available for Zn2+ metal ion, Cu2+ binding site (nine sites): H127 H129 H139 H143 H149 H159 H169 H173 H179. 41 amino acid is available for Na+ binding site. No binding site detected for the following ions:(Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). The v12 predicted ligand binding sites, approximately 139 amino acid can bind the Zn2+ metal ion. The other metal ion binding sites are, Cu2+ binding site (three sites): H59 H81 H93, Ca2+ binding site (eight sites): N196 R409 D464 H465 L473 H494 I498 K532, Mg2+ binding site (four sites): D28 D32 S305 I479, and in the case of Na+ metal ions 55 predicted binding sites are available. Only one acid radical ion can bind this variant protein, SO42− binding site (three sites): S34 V48 R91. No binding site detected for the following ions:(Fe2+, Fe3+, Mn2+, K+, CO32−, NO2, PO43−). v13 had 23 amino acid residues showed the Zn2+ metal ion binding ability and followed by 18 sites available for Ca2+ binding. And other metal ions like Mn2+ binding site (six sites): L16 I21 N22 G24 R25 R85 and Na+ binding site (threes sites): D9 G71 G94. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, K+, CO32−, NO2, SO42−, PO43−). v14 contained three metal ion binding sites (Zn2+, Ca2+ and Na+). In the case of Zn2+, it has 41 binding sites. And followed by 10 binding sites for each metal ions like Ca2+ and Na2+. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). v15, 70 amino acid residues for Zn2+ binding and Mg2+ binding site (four sites): S71 K166 K176 K180, K+ binding site (two sites): G15 S16 and Na+ showed the 13 amino acid binding sites. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mn2+, CO32−, NO2, SO42−, PO43−). v16, approximately 39 amino acid sites showed the Zn2+ metal ion binding ability and followed by Ca2+ binding site (five sites): V11 V13 E21 E31 G32, Mg2+ binding site (three sites): V49 R50 S64, Mn2+ binding site (two): K27 H29 and Na+ binding site (seven sites): D41 C42 C44 H45 N46 C48 D58. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, K+, CO32−, NO2, SO42−, PO43−). v17 showed the 48 sites for Zn2+ binding. The other metal ions like Fe3+ binding site (nine sites): I61 D62 V63 G65 M66 E96 P97 Q98 W102, Ca2+ binding site (seven sites): R58 S60 D62 R69 K71 K72 S106, Mg2+ binding site: D62, Na+ binding site (five sites): G100 S114 P152 G194 C195 and K2+ binding site: C20. The two-acid radicals were identified as bind to this protein i.e., SO42− binding site (four sites): I61 M66 L67 P93 and PO43− binding site (four sites): T64 G65 M66 L67. No binding site detected for the following ions:(Cu2+, Fe2+, Mn2+, CO32−, NO2).

Foot protein: Mgfp1, fps showed only two metal ions binding ability i.e. Zn2+ and Na+. Only one amino acid is available for Na2+ binding (S22) and in the case of Zn2+ had 21 binding sites. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, K+, CO32−, NO2, SO4 2−, P4O3−). The Mgfp3 contained two variants of fps. Only v2 showed the five-metal ion ligand binding site but in the case, v1 had only three metal ion binding been present. In v1, 12 sites for Zn2+ binding and followed by 20 sites for Ca2+ binding and in Na+ binding site (two sites): Y38 Y50. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO2−3, NO2, SO42−, PO43−). In the case of v2, 18 sites were identified for Zn2+ binding and followed by Ca2+ binding site (seven sites): S25 D26 G54 Y55 G56 G57 Y58, Mg2+ binding site (four sites): N31 G33 G42 R46, Na+ binding site (two sites): W38 W61 and K+ binding site (two sites): D23 S25. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mn2+, CO32−, NO2, SO42−, PO43−).

In Mefps-ligand binding analysis revealed the fp2 has shown the maximum ligand binding amino acid residue is present. Mefp1 had only three kinds of metal ions binding ability, Zn2+ binding site (15 sites): C9 C12 T15 D17 H30 Y34 Y94 Y163 Y193 P217 Y275 P279 Y311 P351 Y357, Fe3+ binding site (two sites): K324 S494 and Mg2+ binding site: G3. No binding site detected for the following ions:(Cu2+, Fe2+, Ca2+, Mn2+, Na+, K+, CO32−, NO2, SO42−, PO43−). In Mefp2, ~262 amino acids exhibited the Zn2+ metal ion binding ability and followed by 135 binding sites for Na+ metal ion. The acid radical ion SO2−, binding site (five sites): S296 V302 C305 Y406 G408. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, PO43−).

In Mufp2, ~142 amino acids in this protein available for Zn2+ metal ion binding. Fe3+ binding site (two sites): P89 N94, 26 sites were identified for Ca2+ binding and followed by 46 binding sites for Na+ metal ion. The acid radical CO32− binding site (two sites): N130 R131. No binding site detected for the following ions:(Cu2+, Fe2+, Mg2+, Mn2+, K+, NO2, SO42−, PO43−). The Mufp3, 20 amino acid residues showed the ability to bind the Zn2+ metal ion and Na+ binding site (two sites): Y47 Y50. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). The variants of fp3, the v1 and v2 had only metal ion binding ability but the v3 showed other than metal ion binding ability, also showed acid radical ion binding. The ligand-binding sites of v1 and v2 are almost similar. The metal ion binding sites of v1: 23 sites were available for Zn2+ binding and followed by Ca2+ binding site (10 sites): Y29 N39 Y42 Y43 Y45 N46 G47 G50 W57 W61 and two sites available for Na+ binding (Y48 Y51). No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). v2, contained 20 sites for Zn2+ binding and followed by 13 sites for Ca2+ binding and two binding sites for Na+ binding site (Y48 Y51). No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO42−). v3, Zn2+ binding site (12 sites): N3 Q21 D23 Y28 N32 Y38 K39 R47 Y50 W52 W56 W61, Mg2+ binding site (three sites): G62 R63 K64, Mn2+ binding site (three sites): A11 L14 I15 and acid radical CO32− binding site (two sites): I6 L10. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Na+, K+, NO2, SO42−, PO43−). The ligand-binding sites of other variants of fp3 shared the similar ligand- binding sites. In v4, 25 binding sites are predicted in the case of Zn2+ metal ions and followed by the Ca2+ binding site (eight sites): G56 W57 N58 G60 W61 N62 Y70 L77, Mn2+ binding site (three sites): L9 L12 V13 and Na+ binding site (three sites): Y42 Y48 Y51. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, K+, CO32−, NO2, SO42−, PO43−). The v5, contained 26 binding sites for Zn2+ binding and followed by Ca2+ binding site (seven sites): G56 W57 N58 G60 W61 N62 Y70. Na2+ binding site: Y42 Y48 Y51. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). In v6, 24 binding sites for Zn2+ metal ion, and followed by the Ca2+ binding site (six sites): G40 G43 Y44 S45 G46 G49, Na+ binding site: Y38. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). The v7, 23 binding sites are accessible for Zinc binding, and followed by Ca2+ binding site (three sites): W57 W61 Y70 L77, Na2+ binding site (three sites): Y42 Y48 Y51 and acid radical CO32− binding site (two sites): R54 G56 and SO42− binding site (three sites): N46 K59 G68. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, NO2, PO43−). In v8, 19 predicted sites for Zn2+, and followed by Mg2+ binding site (two sites): Y51 K62, Na+ binding site (three sites): Y42 Y48 Y51. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). The v9, showed the five different metal ions has the ability to bind this protein (Zn2+, Ca2+, Mn2+, Na+, and K+). 20 amino acid residues capable to bind the Zn2+ metal ion and followed by Ca2+ binding site (three sites): W57 N58 W61, Mn2+ binding site (three sites): I6 L9 L10, Na+ binding site (two sites): Y48 Y51. K+ binding site (six sites): L12 L14 N39 N46 W57 K59. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, CO32−, NO2, SO42−, PO43−). In v10, 17 sites for Zn2+ binding and two sites for Na+ binding (Y48 Y51). No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). The v11 contained 21 sites for Zn2+ binding and followed by five sites for Ca binding (G56 N58 G60 N62 W66), four sites for Na+ binding (Y48 Y51 G72 N73). No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). In v12, showed the 22 sites for Zn2+ binding, and followed by other elements like Ca2+ binding site (three sites): L17 V20 A24, Mg2+ binding site (two sites): Y48 Y51, Na+ binding site (two sites): Y48 Y51, K+ binding site: L17 V20 A24, and acid radical CO32− binding site (two sites): R54 G56. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mn2+, NO2, SO42−, PO43−). In v13, 16 binding sites for Zn2+ binding, In the case of other metal ions, Na+ binding site (three sites): Y48 F51 G72 and K+ binding site (seven sites): L12 L14 G16 N39 N46 Y55 G57. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, CO32−, NO2, SO42−, PO43−). The ligand-binding sites of v14, 14 sites for Zn2+ binding and followed by Ca2+ binding site (six sites): G25 G26 G51 Y52 G53 N55, Na+ binding site: Y48 and K+ binding site (five sites): R50 K56 G57 W58 N63. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, CO32−, NO2, SO42−, PO43−).

Mufp6 contained 34 sites for Zn2+ binding, two sites for the Ca2+ binding site (Y99 G111), and 16 sites for Na+ binding. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). The fp6 had nine variants almost all the ligand-binding sites are the same. v1, had 34 binding sites is present for Zn2+ metal ions and followed by 16 sites for Na+ binding. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). In v2, ligand binding sites of each element are, 42 sites for Zn2+ binding, three sites for Cu2+ binding,13 sites for Na+ binding. No binding site detected for the following ions:(Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). In v3, 35 binding sites for Zn2+, one binding site for Ca2+ (D28) and 15 binding sites for Na+. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). In v4, 35 binding sites for Zn2+ and 12 binding sites for Na+. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). The v5 contained 36 binding sites for Zn2+ and 13 sites for Na+ binding. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). In v6, 39 sites for Zn2+ binding and 13 sites for Na+ binding. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). v7 contained 31 binding sites for Zn2+, 10 binding sites for Na+ and three sites for K+ binding. No binding site detected for the following ions:(Cu2+, Fe2+, Fee+, Ca2+, Mg2+, Mn2+, CO32−, NO2, SO42−, PO43−). In v8, 33 sites are available for Zn2+, four sites for Mg2+ binding and nine sites for Na+ binding. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Ca2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). The last variant v9, contained 32 binding sites for Zn2+, four sites for Ca2+ binding and nine sites for Na+ binding. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−).

Foot protein (Myfp1) contained two variants. In v1, 34 sites are mainly focused on the Zn2+ binding, 11 sites for Ca2+ binding and three sites for Na+ binding (N195 T197 S198). No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). The next variant v2, 31 binding sites available for Zn2+ binding, six sites for Ca2+ binding, two sites for Mn2+ binding (T319 Y321), Na+ binding site: P317 and K+ binding site: L233 Q235. No binding site detected for the following ions:(Cu2+ Fe2+, Fe3+, Mg2+, CO32−, NO2, SO42−, PO43−). Comparing both variants of fp1, v2 showed the different types of metal ions binding capacities.

Foot protein, (Pcfp1) contained four variants of fp1. For each protein variant, some of the binding sites of metal ions are similar. In v1, 83 sites are available for Zn2+ binding, two sites for Fe3+ binding (K143 K323), six binding sites for Ca2+ (P236 Y237 P330 P350 Y351 H436) and three binding sites for Na+ (F10 K289 Y317). No binding site detected for the following ions:(Cu2+, Fe2+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). v2, 95 binding sites available for Zn2+, eight sites for Ca2+ binding (P338 P350 Y355 K367 P376 Y377 K379 H432), two sites for Na+ binding site (Y303 P306), and five sites for K+ binding (P174 K177 P178 V180 K181). No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, CO32−, NO2, SO42−, PO43−). The variant v3 contained 67 binding sites for Zn2+, two binding sites for Fe3+ (K153 Y179, six binding sites for Ca2+ (P228 Y229 P322 P326 Y327 V328) and three sites for Mg2+ binding (K22 K23 P24). No binding site detected for the following ions:(Cu2+, Fe2+, Mn2+, Na+, K+, CO32−, NO2, SO42−, PO43−). In v4, 87 binding sites for Zn2+, two sites for Fe3+ binding (Y217 K271) and four sites for Ca2+ binding (P224 Y225 P302 Y303). No binding site detected for the following ions:(Cu2+, Fe2+, Mg2+, Mn2+, Na+, K+, CO32−, NO2, SO42−, PO43−). Except for variant v2, other variables have the Fe3+ binding sites.

Foot protein (Pvfp), a total of five different types of proteins present under the Pvfps, among the five two proteins are the variants of fp1. The ligand-binding characterization of these proteins showed distinct characteristic features. In fp1 v1, contained 38 sites for Zn2+, two sites for Fe3+ binding (H547 H549), two sites for Ca2+ binding (D517 E528), five sites for Mg2+ binding (G525 K526 G548 G550 A559) and four binding sites for Na+ (P443 G548 H549 W551). No binding site detected for the following ions:(Cu2+, Fe2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−). In fp1 v2, 27 binding sites for Zn2+, two sites for Fe3+ binding (H417 H419), two sites for Ca2+ binding (P123 K124) and five binding sites for Na+ (M5 P33 G418 H419 W421). No binding site detected for the following ions:(Cu2+, Fe2+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−).

Pvfp5, 80 sites are available for Zn2+ binding, 31 sites for Ca2+ binding, 22 sites for Na+ binding and acid radical CO32− binding site: C47 R48. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, NO2, SO42−, PO43−). Pvfp6, 38 sites for Zn2+ binding (C6 E24 Q28 C29 I35 C38 C40 I41 E43 N44 S45 E46 C47 D50 N52 C53 A56 C59 C60 D61 F62 C64 C66 N67 C70 C80 G84 Y87 F93 D97 C99 C102 C104 N105 D107 C112 K115 C117) two sites for Ca2+ binding (C102 C112) and Na+ binding site: Q16 D50 S51 C53 C60 Y87 V96 D97 C99 N100. No binding site detected for the following ions:(Cu2+, Fe2+, Fe3+, Mg2+, Mn2+, K+, CO32−, NO2, SO42−, PO43−).

The catechol containing polymers and peptides has the ability to bind various metal ions like Zn2+, Cu2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+ and K+ and acid radical ions, CO32−, NO2, SO42− and PO43−. The overall analysis most of the foot proteins showed the Zn2+ and Na+ metal ion binding ability and only few Mfps have the ability to bind acid radical ions.

Unbosoming the emergence of Bivalvia: Perspectives on mitogenome, TimeTree, and Mfps

The Bivalvia evolution is very complex in nature, the evolution of Bivalvia starts from Cambrian periods3. The bivalve origins, evolution of their phenotype and functional divergence of Mfps is largely remained unresolved. Evolutionary pattern of byssus thread producing Bivalvia in different perspectives revealed the functional divergence and speciation pattern.

Mitogenome – Phylogenetic construction analysis of bio-adhesive producing bivalves

The complete mitochondrial genome sequence analysis revealed the mitochondrial genome evolutionary pattern in the byssus thread producing bivalves. Based on the mitochondrial genome evolution, the speciation of all Mytilus species is from the same clad but interestingly founded that Perna perna is also originated from the speciation node of Mytilus. In the genus of Perna, currently, three living species only exist, the stem branch of P. canaliculus and P. viridis is entirely separated from P. perna. The monophyletic origin of mitogenome initially separated into two branches. The one branch contained three species of Bivalvia, with entirely different order taxa, the D. polymorpha under the order Myoida and another are Mytiloida (P.canaliculus and P.viridis). And another set of clad, the first separated taxa is Pectinoida and then followed by Ostreoida and Myoida. The Ostreoida is closely resembling the mitogenome of Mytiloida. Among the Mytilus species, the M.galloprovincialis is the first originated species and it is the ancestor of all other Mytilus sp. and P.perna also. The M.unguiculatus and M.californianus is existed in the same clad.

Phylogenetic speciation based on the mitochondrial genome, primitive to recently evolved byssus thread producing bivalve: Mytilus galloprovincialisMytilus californianusPerna pernaMytilus unguiculatus → Mytilus edulis → Atrina pectinata → Perna viridis → Perna canaliculus→ Dreissena polymorpha → Mizuhopecten yessoensis (Fig. 4F).

Figure 4
figure 4

TimeTree analysis of (A) Bivalvia (B) Mytilidae (C) Dreissena sp. (D) Perna sp. (E) Mytilus sp. generated in TimeTree.org tool. (F) Maximum likelihood phylogenetic construction based on mitogenome, MUSCLE alignment and Tamura-Nei model – generated in MEGA X tool.

Timetree of bivalvia

The evolutionary TimeTree of life (TTOL) it helps in understanding the origin and diversity of life forms. The clock like changes analysis, tracing out the speciation and diversification process and events. The diversification of Bivalvia order, TimeTree analysis interestingly revealed the ancestors of byssus thread producing bivalves. The evolution of Bivalvia starts from the Cambrian period (523 MYA), the first diversification/speciation (clad separation) occurred in the starting period of Ordovician (488 MYA). The majority of diversification observed from the Ordovician period. Nuculanoida order is the ancestral group, because without any diversification (clad separation) and followed by Limoida. The byssus thread producing bivalves mainly under the four orders ie, Myoida, Mytiloida, Ostreoida, and Pectinoida. The Myoida is the first diversified group (~465 MYA), evolutionarily it diverged from Mytiloida (~453 MYA). The other stem branch of Mytiloida, the diversification occurred in the Devonian periods, Ostreoida (~398 MYA) is the first diversified group comparing to the Pectinoida (~ 378 MYA). The Bivalvia evolutionary diversification analysis fascinatedly founded that only these two orders (Ostreoida and Pectinoida) are diversified during the Devonian period. After the major extinction (251 MYA), during the starting age of the Triassic period, diversification of Trigonioida and Unionoida has occurred, these are the latest evolved groups in Bivalvia (Fig. 4A).

In the case of Mytilidae time tree analysis, about 387 MYA, the initial diversification of Mytilidae started. During the Jurassic period, ~172 MYA, the genus of Mytilus and Perna is separated from the Perumytilus and Brachidontes. The diversification of clad separation of Mytilus and Perna in the Cretaceous period, upper Epochs. 88 MYA, speciation of Mytilus has occurred, comparing other species with M.californianus is present in a separate clade. The latest evolved species in the Mytilus sp. is M.galloprovincialis and M.trossulus. The M. edulis speciation occurred in 2 MYA. In about 15 MYA ago, the diversification and speciation of Perna are starts. The speciation of P.perna and P.canaliculus is observed in 6 MYA.

Based on the TimeTree analysis, it divulges the clock-like speciation and diversifications. The speciation event of byssus thread producing bivalve, ancestor to latest evolved order is Dreissena polymorpha → Mytilus californianus → Mytilus edulis → Mytilus galloprovincialis → Mytilus unguiculatus → Perna viridis → Perna perna → Perna canaliculus → Atrina pectinata → Mizuhopecten yessoensis (Fig. 4B–E).

The mitochondrial genome revealed the fascinating key of Mfps producing Bivalvia origin, M.galloprovincialis is the first evolved Bivalvia then followed by M.californianus. But, in the case of TimeTree analysis, the first evolved Bivalvia is D.polymorpha, it is an brackish/fresh water forms, then followed by M.califroninanus. Both mitogenome and TimeTree revealed the second evolved form is M.californianus. The natural selection behavior, molecular evolutionary clock speciation indicated as the fresh/brackish form bivalve is the first evolved form, is evidently supported by the Bivalvia taxa evolution and diversification. Contradiction outcome observed in the mitogenome analysis, under the natural selection pressure is marine bivalve to brackish/fresh aquatic forms of species diversification raised. The mitogenome is an important potential target of natural taxa selection spread across the gradients of the ecosystem20. The bivalve spread in the costal belt habitats with dynamic changes such as temperature fluctuation, salinity, dissolved oxygen, desiccation, UV- radiation and exposure to chemical pollutants etc., which can induce oxidative stress to them21, may influence the respiration of the mitochondria and cause irreversible damage to mtDNA22.

Intra-phyletic evolutionary relationship of Mfps

Insight the evolutionary pattern of foot proteins entirely different form the mitochondrial genome evolutionary pattern and TimeTree analysis. The first evolved Mfps is Mcfp3 V9 and followed by Mcfp3 v10, Mufp3, Mufp3 v7, Mufp3 v12, Mufp3 v9, Mufp3 v11, Mufp3 v8, Mcfp3 v7, Mcfp3 v3, Mcfp3 v11, Mcfp3 v8, Mcfp3 v5, Mcfp3 v4, Mufp3 v6, Dpfp1, Mufp3 v1, Mufp3 v4, Mufp3 v5, Mcfp6, Mufp3 v2, Mufp3 v10, Mufp3 v13, Mcfp3 v6, Mufp3 v14, Mcfp3 v2, Mcfp3 v1, Mufp3 v3, Myfp1 v2, Pvfp1 v1, Pvfp1 v2, Mgfp3 v2, Mgfp3 v1, Mufp6 v2, Mufp6 v4, Mufp6 v5, Mufp6 v6, Mufp6 v3, Mufp6 v8, Mufp6 v1, Mufp6 v7, Mufp6 v9, Mufp6, Mcfp6 v1, Mcfp5, Mcfp18, Mcfp6 v2, Mcfp6 v3, Mcfp10, Mcfp7 v1, Mcfp7 v2, Apfp1, Mcfp17, Mcfp9 v1, Mcfp9 v2, Mefp2, Mcfp14, Pvfp5, Mcfp12, Mcfp11, Mcfp13, Mcfp4 v2, Mufp2, Mcfp2, Mcfp8, Pcfp1 v3, Pcfp1 v4, Mcfp4 v1, Pvfp3, Pcfp1 v1, Pcfp1 v2, Mcfp1 v1, Mcfp1 v2, Pvfp6, Mefp1, Mgfp1, Mcfp16 and Myfp1 v1 is the latest evolved Mfps. The first foot protein is appeared in M.californianus and then followed by M.unguiculatus, D. polymorpha, M. yessoensis, P.viridis, M.galloprovincialis, A. pectinata, M.edulis, and P.californianus. Interestingly founded that natural selection divergence exists in Mfps. Because during the evolutionary patterns, the expression level of each Mfps has occurred at different time intervals based on their dynamic environmental conditions (Fig.5).

Figure 5
figure 5

Mussel foot proteins (Mfps) based phylogenetic analysis by using the maximum likelihood and the JTT matrix model. Sequence were aligned by using MUSCLE and tree generated in the MEGA X tool. Each color code indicated as the respective bivalves (Drawing: YSV).

The evolutionary and environmental forces equally blend together to tune the unique constitution, magnitude and function of each proteins in the adhesive secretion. The functional property of each protein in root, stem, thread and plaque of byssus thread are highly predisposed and extremely intricate2,4,23. The evolutionary lineage of Mfps revealed the sedentary mode of life style preference of an adult organisms. The Mfps property determined by the geographical habits of the organisms. These byssus threads producing bivalves is randomly distributed all over the world, each geographic zone has the specific dynamic characters are presents, in the case tidal power, salinity, temperature, wave actions etc. Based on this property the evolution and functional divergence of Mfps may evolved. The evolutionary divergence of Mfps producing bivalve, M.edulis is showed the highly complex geographical distribution pattern. The geographical distributional pattern of Mytilus sp. are widespread that exhibit an anti-tropical distribution with M.edulis, M.californianus, and M.unguiculatus occurring in the Northern Hemisphere and M.galloprovincialis distributed in Northern and Southern Hemispheres24. The geographical distribution of Perna sp. is mainly occurred in the tropical zone. P. canaliculus is randomly distributed in Southern temperate region (Fig. 6 and Supplementary data S1- Table 1).

Figure 6
figure 6

Approximate geographical distribution of selected Bivalvia species. Generated in – OBIS 2.0 server (2019) (https://mapper.obis.org/). [Available: Ocean biogeographical information system (OBIS). Intergovernmental oceanographic commission UNESCO. www.iobis.org. Accessed: 12th September 2019]. (Supplementary S1).

By analyzing all Mfps, the fp3 and then followed by fp5 and fp6 is the evolutionary lineage of foot proteins in selected bivalves species (all Mytilus sp. except M.edulis fp3 is not available) except P.viridis. Because the fp3,5 and 6 are predominantly found in plaque region of the byssus threads and it contribute to the wet adhesion2. It can be easily concluded that in all byssus thread producing bivalves, fp3 is the ancestor of all other existing Mfps and the wet adhesion property is the core phenomena of all Mfps. In Perna sp. first evolved Mfps is fp1, they actually provide the hydrophobic nature and act as protective varnish layer25. Comparing to the all Mfps, the fp1 and fp2 is the last evolved Mfps, because the fp1 mainly act as protective functions and fp2, provide the structural integrity to the adhesive plaques14. After the fp1, fp5 and followed by fp3 and fp6 is the evolutionary lineages of wet adhesive property of Mfps in Perna sp. except fp3, the fp5 has the specialized mechanism for ability to bind into the calcareous mineral substrate. Fp5 contained the phosphoserine, the post translational modification of phosphoserine gives the ability to bind calcareous materials. The fp5 is the first produced foot protein by P. viridis for surface water replacement and then followed by fp3 and fp6 it gives the stability to wet adhesion13. Mfps revealed functional evolutionary origin characterization and speciation of Perna sp26. The Darwin natural selection pressure is observed in the expression of Mfps diversification because the Mfps is played a vital role in wet adhesion and helps to development of the sedentary mode of lifestyle. The natural selection depends on the environment and requires existing heritable variation in a group. This is the first report of the phylogeny construction of all available Mfps and evolutionary analysis based on the functional divergence.

Conclusion

This is the first report by using the insilico methods to evaluate the physiochemical structural and functional characterization of all available Mfps revealed the unique characteristic features of each mussel foot proteins (Mfps). Required more than a thousand mussels for each Mfps extraction and characterization from different species of mussels in the aim of creating strong adhesives materials. In this works highlighted the several biochemicals, molecular, structural and functional features of the Mfps, these results help to the future development of bio-adhesives in different perspectives. We are not only revealed the bio-adhesive property of Mfps and also revealing the complex nature of evolutionary lineages and diversification of Mfps and selected Bivalvia species with geographical distributions.

Materials and Methods

Datasets

Bivalve Mfps (mussel foot proteins) sequences in FASTA format were retrieved from the NCBI protein database (August 2019) (http: www.ncbi.mlm.gov/protein). Selection criteria are mainly based on Mfps producing bivalves in which the complete sequence of at least one adhesive protein is identified.

Molecular modeling

The MUSTER algorithm used for protein modeling10,27. This server (https://zhanglab.ccmb.med.umich.edu/MUSTER/) analyzes the previous sequence profile-profile alignment (PPA) method and the best template used for the homology modeling of Mfps. The models were evaluated in PROCHECK27,28 and PDBsum server12,27, and the visualization of the protein model in PyMol27 and EzMol 2.129.

Signal peptide prediction

Phobius (http://phobius.sbc.su.se/)30 and SignaIP 5.0 (http://www.cbs. dtu.dk/services/SignalP/)31 servers were used to analyze the signal peptide topology prediction27 of Mfps.

Functional characterization of Mfps

FFPred 3 (http://bioinf.cs.ucl.ac.uk/psipred/)17 server used for functional characterization of Mfps. The predictions are made by scanning the input sequences against an array of Support Vector Machines (SVM). In this server, large SVM library that extends its coverage to the cellular component sub-ontology for the first time, prompted by the establishment of a dedicated evaluation category within the critical assessment of functional annotation. For further analysis of the functional characterization of Mfps, the probability range set to be above 0.800.

Chemical structural characterization of Mfps

SAPS32 (https://www.ebi.ac.uk/Tools/seqstats/saps/) server evaluates a wide variety of protein sequence properties using statistics. Properties considered include compositional biases, clusters and runs of charge and other amino acid types, different kinds and extents of repetitive structures, locally periodic motifs, and anomalous spacing between identical residue types.

Physio-chemical characterization of Mfps

Expasy protparam (https://web.expasy.org/protparam/) server33 analyze the physicochemical properties of Mfps likes, isoelectric point (pI), molecular weight (Mw), extinction coefficient (EC- quantitative study of protein-protein and protein-ligand interactions), instability index (II- stability of protein), aliphatic index (AI- relative volume of protein occupied by aliphatic side chains), and Grand Average of Hydropathicities (GRAVY – sum of all hydropathicity values of all amino acids divided by number of residues in a sequences).

Accessible surface area (ASA) analysis

VADAR (http://vadar.wishartlab.com/) server34 is a compilation of more than 15 different algorithms and programs for analyzing and assessing peptide and protein structures from their PDB coordinate data.

Ion ligand-binding site prediction

IonCom (https://zhanglab.ccmb.med.umich.edu/IonCom/)35 is a ligand-specific method for small ligand (including metal and acid radical ions) binding site prediction. Starting from given sequences or structures of the query proteins, IonCom performs a composite binding-site prediction that combines ab intio training and template-based transferals. The server focuses on binding site prediction of thirteen most important small ligand molecules, including nine metal ions (Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+) and four acid radical ions (CO32−, NO2,SO42−, PO43−).

Phylogeny construction of Mfps

Phylogenetic analysis of 78 Mfps were performed in MEGA X software36 and 78 Mfps were aligned by using MUSCLE software. The evolutionary history was inferred by using the Maximum Likelihood method and the JTT matrix-based model37. The tree with the highest log likelihood (−3861.65) and Initial tree(s) for the heuristic search was obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model and then selecting the topology with superior log likelihood value. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site.

Ancestral analysis – mitogenome based

Ancestral states were inferred using the Maximum Likelihood method38 and the Tamura-Nei model39. The tree shows a set of possible nucleotides (states) at each ancestral node based on their inferred likelihood at site 1. For each node, only the most probable state is shown. The initial tree was inferred using the method. The rates among sites were treated as being uniform among sites (Uniform rates option). This analysis involved ten nucleotide sequences. Codon positions included were 1st + 2nd + 3rd + Noncoding. Evolutionary analyses were conducted in MEGA X36 with MUSCLE alignment.

TimeTree construction

Evolutionarily time scale tree construction of different orders of Bivalvia, Mytilidae genus, Dreissena, Mytilus and Perna species by using TimeTree40 (http://www.timetree.org/search/goto_timetree). The TimeTree is a public knowledge-base for information on the evolutionary timescale of life. In the TimeTree server build the time tree of a group of species or custom list41,42,43,44,45,46,47,48,49,50,51.

OBIS map construction

Geographical distribution of selected Bivalvia species map constructed by using OBIS 2.0 server (https://mapper.obis.org/). Ocean biogeographical information system (OBIS) is a global open-access data and information clearing-house on marine biodiversity for science, conservation and sustainable development.