The discovered chimeric protein plays the cohesive role to maintain scallop byssal root structural integrity

Adhesion is essential for many marine sessile organisms. Unraveling the compositions and assembly of marine bioadheisves is the fundamental to understand their physiological roles. Despite the remarkable diversity of animal bioadhesion, our understanding of this biological process remains limited to only a few animal lineages, leaving the majority of lineages remain enigmatic. Our previous study demonstrated that scallop byssus had distinct protein composition and unusual assembly mechanism apart from mussels. Here a novel protein (Sbp9) was discovered from the key part of the byssus (byssal root), which contains two Calcium Binding Domain (CBD) and 49 tandem Epidermal Growth Factor-Like (EGFL) domain repeats. Modular architecture of Sbp9 represents a novel chimeric gene family resulting from a gene fusion event through the acquisition of CBD2 domain by tenascin like (TNL) gene from Na+/Ca2+ exchanger 1 (NCX1) gene. Finally, free thiols are present in Sbp9 and the results of a rescue assay indicated that Sbp9 likely plays the cohesive role for byssal root integrity. This study not only aids our understanding of byssus assembly but will also inspire biomimetic material design.

BLAST 2.3.1+ software was used to compare sequences from SMRT sequencing and sequences from Sanger sequencing. The consensus sequences from BLAST were analyzed by the software MEGA7 (http://www.megasoftware.net/) and the final sequence was acquired by correction based on the multiple sequences through insert the missing bases and/or delete the redundant bases.

Expression profiling of EC and related genes in scallop foot and other
organs/tissues. The expression levels of EC and related genes were retrieved from the published RNA-seq datasets of C. farreri 1 , including eleven adult organs/tissues (blood, eye, foot, female gonad, gill, hepatopancreas, kidney, mantle, male gonad, striated muscle, smooth muscle) and three foot subregions (tip, middle and root). Each organ or tissue was represented by three biological replicates. The expression value was calculated using the TMM algorithm implemented in EdgeR software 2 and was represented as reads per kilobase per million mapped reads (RPKM).The expression levels of EC and related genes S3 are represented by an average RPKM of three biological replicates.

Polyclonal antibody preparation. Polyclonal antibody was prepared and purified by
ABclonal Biotechnology (Wuhan, China). Three rabbits were injected with CBD1 Sbp9 as described in the methods under Recombinant protein over-expression and purification; the protein was dissolved in PBS and complete Freund's adjuvant in a ratio of 1:1, and the rabbits were boosted three times (at weeks 3, 6, and 9). Affinity chromatography was utilized for antibody purification.
Evaluation of the chemical form of the Cys residues. The amount of nonoxidized Cys residues was quantified spectrophotometrically using Ellman's reagent (5,5'-dithiobis (2-nitrobenzoic acid), DTNB), which reacts with thiol groups of free thiol to yield2-nitro-5-mercapto-benzoic acid (TNB) 3 . EGFL2and EGFL4were stood overnight at ambient temperature (~ 25 °C). Then, 50 μL of 10 mM DTNB were added to 2.45 mL of 20 mM Tris-HCl pH 8.5 buffer containing ~1 mg/mL protein. The amount of thiol remaining in the reaction medium was quantified by measuring TNB absorbance at 412 nm. L-cysteine was performed as a standard and lysozyme, of which the eight Cys residues form four disulfide bonds, was performed as a negative control.

Recombinant protein over-expression and purification.
To enhance the yield of the recombinant proteins, the codon optimization was carried out and the corresponding genes were synthesized. The CBD1 and EGFL4 fragment were obtained by PCR amplification using the synthesized CBD1 Sbp9 -EGFL4 as template. All the primers were provided in Table   S6. Then the fragments were digested with restriction enzymes (BamHI and XhoI), before they were inserted into the modified pET-32-HisTT (modified from Novagen pET-32) vector. All the recombinant constructs were verified by DNA sequence.
To over-express the recombinant interest protein, these constructs were transformed into E. coli BL21 (DE3) cells, which were then cultured in LB medium containing kanamycin (30 mg/L) at 37 °C. When D600 reached ~ 0.6, protein over-expression was then induced using 0.2 mM IPTG at 16°C overnight. The cultures were harvested by S4 centrifugation (1,500 g, 30 min, 4°C) and the cells were suspended in PBS. After sonication, the resulting lysates were clarified by centrifugation at 16,000 g for 10 min.
For the CBD1 Sbp9 , the protein was in the insoluble cell pellet, which was washed by PBS buffer and 1 M urea, the resulting cell pellet was then solubilized in 20 mM Tris-HCl pH 8.0 binding buffer containing 5 mM imidazole, 8 M urea were purified with a Ni 2+ -NTA agarose and eluted with 20 mM Tris-HCl pH 8.0 elution buffer containing 500 mM imidazole, 8 M urea. CBD1 Sbp9 was refolded using a Sephacryl S-100 column (GE Healthcare, Chicago, Illinois, U.S.) eluted in 20 mM Tris-HCl pH 8.5. Preparation of EGFL4 was similar to the CBD1 Sbp9 except the Ni 2+ -NTA was not applied.
The recombinant protein pure was evaluated by 15% SDS-PAGE ( Figure S2). The

Figure S3. Sequence alignments of EGFLs in Sbp9
The PCGGPC motif at the first two Cys residues, which is unique among other EGFLs (highlighted by the blue in 2D), is highlighted by the blue.