Introduction

Natural materials have remarkable functional properties and ambient processability: threads with high strength, glues that cure underwater, and ceramics that resist fracture. Multiple functions emerge when materials evolve under harsh conditions, for instance, many natural marine glues exhibit added bond toughness to withstand dynamic ocean currents or wave action at the seashore. For sessile marine organisms such as barnacles, oysters and tubeworms, the adhesive is required to function for a lifetime and persists long afterwards. Such permanent adhesives use inventive adaptations to achieve durability, such as hierarchical structuring and chemical modification at the molecular and supramolecular level1,2. Despite inspiring scientific inquiry for more than a century3 and impeding maritime operations even today4, the permanent bond of adult barnacles is among the least understood. In contrast to marine organisms that use glues to fabricate protective shelters (e.g., sand-castle worm tubes5, case-maker fly larva retreats2, and amphipod tubes6,7) or tie themselves to rocks, (e.g., mussel byssus threads8,9) adult barnacles produce their adhesive interface in a sequential process hidden under their base as a part of their normal growth cycle10,11. The recent finding that barnacle adhesive is nanostructured and held together as an amyloid-like material12,13,14 further distinguishes it from archetypal marine adhesives processed into solid foams1,9,15 or spun threads2,6.

Barnacle glue maintains a high beta sheet content12,16. The adhesive nanofibers produced are particularly insoluble and consist of numerous protein components17,18. Over a 20 year period, five protein sequences have been identified through the use of aqueous denaturants such as guanidine hydrochloride, formic acid, and urea17,18,19. These treatments have solubilized a fraction of barnacle cement, though the inability to target the primary interactions between cement proteins remains a significant roadblock. Of the known proteins, putative functions have been assigned to each component based on the sequence chemistries and their observed abundance upon disassociation. For example, a 19 kDa hydrophilic component is assigned as a versatile surface-binding protein while 52 and 100 kDa proteins are thought to comprise the bulk of fibrillar cement18,20,21. High levels of aliphatic residues in these cement components have led to a hypothesis that the barnacle adhesive is partially held together and operates through a hydrophobic effect17,20,22,23,24. However, it has been difficult to establish the main components of bulk cement as many demonstrate in vitro fibril formation14,25,26,27, while a significant portion remains insoluble and, therefore, unidentifiable.

Progress in understanding cement has also left inconsistencies in knowledge of protein composition and the resulting chemical interactions that hold nanofibrils together. An observed release of proteins from bulk glue in the presence of reducing agents led to the assertion that disulfide bonding plays a role in cement cross-linking19,23,28. However, there is no spectroscopic evidence for S-S bonding in the cement layer29,30 and cysteine content remains low or nonexistent in the proteins thought to comprise bulk glue or the native cement itself10,19. Discrepancies exist between the handful of identified components and the amino acid composition of the proteinaceous cement layer, while major components remain unsequenced. Many uncertainties arise from the use of techniques poorly suited to sequence and analyze mixed protein samples or aggregates.

Recently, extensive studies of mRNA and protein expression in barnacles have provided systemic insight into metamorphosis, adult development and molting31,32,33,34. This has been enabled by the development of high throughput RNA sequencing (RNA-seq) as well as tandem mass spectrometry methods that together create proteomic databases directly from complex materials35. These methods are ideal to study the composition of barnacle glue, as protein complexes are sequenced as cleaved peptide fragments. Analysis of transcript sequences collected from the basal membranes of barnacles A. amphitrite and T. japonica confirmed cement proteins are produced in tissues contained just above the baseplate34,36,37. Both studies performed on A. amphitrite have revealed two additional cement proteins that share primary structure and amino acid chemistry with existing sequences, suggesting that barnacles produce cement proteins in specialized subfamilies34,37. However, transcriptomic studies have revealed few new proteins in addition to those identified more than a decade ago. Without direct proteomic sequencing of solubilized cement, insight into nanostructure composition and the relationship of components with themselves or other adhesive materials remains a challenge.

To determine the sequence basis of cement nanostructure, we have established a comprehensive barnacle cement proteome. In this work, we employ a targeted approach to disassemble the adhesive interface and report ca. 50 proteins, most of which were previously unidentified. We construct our proteome using three strategies: i) milligram-scale collection of cement attached by barnacles onto glass microspheres and collection of a thick and opaque cement type, ii) non-covalent breakdown of collected materials through the use of organic solvents and iii) transcriptome-led protein sequencing of individual bands from SDS PAGE gels as well as proteins collected from the entire gel lane. Based on the findings from these collections, two classes of proteins are defined, glycine/serine-rich cement proteins (GSrCPs) and leucine-rich cement proteins (LrCPs). The former share a conserved primary structure with previously identified proteins. Polar GSrCPs are found to share homology to certain silk motifs through short domains that define a distinct primary structure. The collection techniques and subsequent data analysis offer the clearest and most comprehensive picture of barnacle cement composition to date and reveal a prominent role for GSrCPs and low complexity in the construction of barnacle cement nanofibrils.

Results

Attachment Surfaces are Coated with Dense Nanofibrils Dissolved by Organic Solvents

The adhesive secretions of barnacles are particularly challenging to access – the proteinaceous layer is typically about a micron in thickness and is bonded permanently during the life of the barnacle to the substrate; in Amphibalinid species this layer is protected under an opaque calcium carbonate base plate. Thus, three collections (Fig. 1a) were compared: (a) material from large barnacles (5–10 mm) removed from silicone panels that exhibited an opaque, fibrillar adhesive (‘opaque’ – Fig. 1a,i,ii); (b) material on glass microspheres (48–85 μm diameter) that remained attached to barnacles grown on and then lifted off a bead bed (‘bead’, Fig. 1a iii,iv); and (c) adhesive secreted under barnacles grown for two months on sodium aluminoborate glass that forms a hydrated reaction layer facilitating barnacle removal, referred to as a ‘medallion’ collection34,38. Adhesive from ‘opaque’ and ‘bead’ collections were not readily soluble in known protein denaturants such as Urea (Fig. 1b) and a moderately polar organic solvent ethyl acetate (EtAc), but readily released protein in strongly polar hexafluoroisopropanol (HFIP). Like the thicker fibrillar adhesive under the barnacle, fibrils on beads were insoluble in EtAc and Urea, where materials were observed to remain until exposure to HFIP (Supplementary Figure S1)13,30. Solubilizing the adhesive with HFIP left little residual material in the loading well of PAGE gels and yielded a distribution of protein bands with similar molecular weights as those found previously by others17,19. Generally, breakdown using HFIP yields 5 main bands (150, 100, 63, 35, 20 kDa) from ‘bead’ collections and 7 bands (250, 100, 63, 35, 20, 19, 14 kDa) from ‘opaque’ adhesive. The ‘medallion’ samples were sufficiently solubilized by DTT, with bands at (250, 70, 63, 20 kDa). In contrast to previous studies, the most prominent band in these solubilized samples were at an apparent molecular mass of 63 kDa.

Figure 1
figure 1

Collection and breakdown of cement samples from adult A. amphitrite barnacles.

(a) Examples of fine nanofibrils networked into dense materials observed in multiple collection methods: (i) Thick and opaque cement layer secreted from the barnacle underside, (ii) SEM micrographs of (i) showing a mat consisting of fine nanomaterials, scale bar represents 1 μm. (iii) Underside of a barnacle settled on a bed of glass microspheres after one day, with large accumulations at the periphery, (iv) SEM micrographs of barnacle-adhered microspheres entrapped by cement secretions, scale bar is 100 μm. Inset, 2.25 μm2 image showing close up of nanofibrils on beads, scale bar is 500 nm. (b) Breakdown of collected cement using ethyl acetate (lanes 1 and 4), urea (lanes 2 and 6), hexofluoroisopropanol (lanes 3 and 7), and dithiothreitol (lanes 4 and 8) solvents, eluted proteins analyzed by SDS-PAGE and (c) Full length PAGE run of ‘opaque’ and ‘microsphere’ adhesives after dissolution by HFIP with an abundant protein released at 63 kDa among other well-known bands at 250, 100, 35, and 19 kDa. Left is ‘opaque’ cement, right is ‘microsphere’. Uncropped whole gels from (b,c) are available in Supplementary Figure S7.

MS/MS Analysis of Solubilized Nanofibrils Yields Many Proteins and Reveals the Sequence of a Major Cement Component

Combining peptide sequences derived from whole lane sequencing, defined as aggregate bands, of ‘opaque,’ ‘bead,’ and ‘medallion’ collections reveals the barnacle adhesive interface is chemically complex and develops from a large number of proteins. A total of 1113 unique peptides belonging to 90 putative proteins were identified by searching a translated cDNA database generated from the cement gland region as reported in our previous transcriptomic study34. Comparative analysis of proteins from aggregate MS/MS data demonstrates a set of common proteins among collection methods that share at least four unique peptide matches (Fig. 2a). More than two-thirds of the proteins were identified in all three collections, while nearly 90% of identified proteins were shared when only comparing ‘opaque’ and ‘bead’ collections (Fig. 2a). Inspection of all peptides identified from three types of collected cement shows that a majority are shared among 25 proteins coded by our transcript database (Supplementary Table S1). Identified proteins exist largely as mildly hydrophilic, while a small portion have Grand Average Hydropathy (GRAVY) values above 0 (Fig. 2b). All collection methods reveal the presence of previously reported Amphibalanus amphitrite cement proteins (AaCPs, where the theoretical molecular weight in kDa is appended to the name) AaCP1934, AaCP10036, AaCP11434, AaCP5236, and settlement inducing complex (SIPC) AaCP170, but not AaCP14.

Figure 2
figure 2

Broad sequence properties of the combined proteome showing (a) Number of identified proteins shared among sample collection methods that contain at least four peptides. (b) Bar graph of proteins sorted by GRAVY values showing 80% are hydrophilic with hydropathy values at around −0.4, while 14% are more hydrophobic with values above 0. (c) Graph of isoelectric point (pI) values in increasing order, with an average of 8.9±2.4 where 40% of the proteins lie above 10. (d) Pairwise E-values for 52 protein sequences represented by a two toned look up table, clustered into 7 regions of homologous proteins with 10 outlying individual pairs. Darker blue regions have no homology. Yellow squares indicate identity. (e) Outline and identification of self-similar protein families, named by the highest scoring protein sequence.

The most abundant proteins separated by SDS-PAGE were found at 63 kDa. This band, as well as other prominent bands, were excised from gels and digested with trypsin for MS/MS analysis. Peptides derived from the 63 kDa band were found to be coded by transcript comp41238_c0_seq1 from our transcriptome database, containing 448 amino acids with a predicted molecular weight of 43 kDa, named by previous convention as AaCP43 (Supplementary Table S2). In most cases, AaCP43 maintained the highest number of peptides among single 63 kDa band analysis (Supplementary Table S2) and typically occupied one of the highest peptide counts in the aggregate band analysis (Supplementary Table S1). Although this protein runs at 63 kDa by PAGE, analyzed peptides from aggregate and isolated bands cover only the established coding region of 43 kDa, leaving a 20 kDa discrepancy that could be due to protein complexation or other post-translational modifications. To shed light on this, the 43 kDa coding region was recombinantly expressed in E. coli and found to behave similarly as the Wild-Type protein, migrating to ca. 60 kDa by SDS-PAGE (Supplementary Figure S2 and Supplementary Methods). Treatment of the solubilized glue with glycosidases also yielded no shift in molecular weight of this band, suggesting that heavy glycosylation and protein complexation are not responsible for the 20 kDa shift. Acid hydrolysis of this band matches the sequence composition for AaCP43 (Supplementary Figure S3) and also is consistent with the composition of major, but unsequenced, bands in other barnacle species including the major 58 kDa band found by Naldrett, et. al in B. eburneus17 and a 68 kDa band identified by Kamino, et. al in M. rosa19. Compositions of these bands across species share greater than 10% glycine, serine, threonine and alanine, indicating that homologs of AaCP43 may exist in at least two other species. The full translated sequence length of AaCP43 was verified by RT-PCR on collected mRNA samples34, revealing only three nucleotide mismatches from the RNA-seq data (Supplementary Figure S4).

Cement Proteome Contains a High Number of Homologous Proteins

Sequences from aggregate band analysis of ‘opaque,’ ‘beads’, and ‘medallion’ samples were pooled with single band analysis, including an additional sample of barnacles removed from glass coverslips34 (Supplementary Tables S3–5) to produce a combined cement proteome. Sequence similarity among highest scoring proteins was explored by performing pairwise alignment (BLAST), where expectation values (e-values) between 52 selected proteins were clustered hierarchically by shared homologies into a 2D array (Fig. 2d). Identical positions are marked in yellow, regions of similarity (e-value <10−4) are red, and blue gradients represent low and null values with no homology as the darkest blue value. A number of high similarity regions emerge along with several discrete points outside these regions (Fig. 2e, listed in Table 1). The largest region labeled ‘19-like’ is centered around AaCP19 which comprises nine proteins- five full length sequences with MW ranging from 19 to 49 kDa, three partial C-/N-terminus sequences, and one more partial sequence we found only in the translated protein database. These sequences are grouped by a family name ‘Aa19’. Two other regions are found around barnacle proteins AaCP52 and SIPC36, as well as one around the abundant protein AaCP43 discussed above (grouped by family names ‘Aa52’ and ‘Aa43’ respectively). We found three additional smaller clusters. The first of these clusters consists of three full protein sequences having no known homology to previously reported proteins; this was labeled ‘AaCP57-like’ (grouped by family name ‘Aa57’) using the predicted MW of the highest scoring protein sequence. The last two larger groupings were identified by searching the nrNCBI protein database, revealing homology to protease inhibitors and the enzyme lysyl oxidase (Lox), labeled ‘Lox-like’ (see Table 2). Finally, discrete points of similarity revealed several other pairs including two 105 kDa proteins (Table 1). Other known homologues (AaCP100/AaCP11434 and AaCP20-1/AaCP20-237) were found to verify the similarity matrix.

Table 1 Classification of cement-related proteins from aggregate sequencing into Glycine/Serine rich cement proteins (GSrCPs) and Leucine rich Cement Proteins (LrCPs).
Table 2 Annotations for cement proteins determined by searching nrNCBI.

The amino acid composition of previously named proteins AaCP19, AaCP52, AaCP100 and AaCP114 lie at a hydrophobic extremity (ca. −0.1 to 0.2) while the new Aa43, Aa57 and Aa19 families are closer to the average hydropathy for the cement proteome shown in Fig. 2b (ca. −0.4). This distinction is more pronounced (Fig. 3b) when a measure of alanine, valine, isoleucine and leucine side chain volume, the aliphatic index of the proteins, is considered. Therefore, we categorize proteins by composition into Glycine/Serine-rich cement proteins (GSrCP, including the Aa19, Aa43, and Aa57 families) and Leucine-rich cement proteins (LrCP, including the Aa52 and Aa100 families) as defined in Table 1 and Fig. 3c. The GSrCPs are polar with an abundance of Gly/Ala/Ser/Thr residues while LrCPs are aliphatic, rich in Val/Leu/Ile residues. We propose a new naming scheme for the two broad categories of cement proteins found, GSrCP-[family name]-[x] for new polar proteins and LrCP-[family name]-[x] for existing aliphatic proteins. In addition to broad compositional similarities, various single residues are enhanced within each class of proteins, marked by asterisks in Fig. 3c to include Arg/Pro/Lys/Tyr. For comparison the compositional profile of hydrolyzed ‘bead’ collection is shown in the lower right corner of Fig. 3c.

Figure 3
figure 3

Classification of homologous cement proteins by amino acid composition.

(a) Corresponding values of hydropathy ranging from hydrophobic to hydrophilic, (b) Aliphatic index values of cement proteins are bimodal, when relative volumes of alanine, valine, isoleucine and leucine side chains are compared. (c) Left, polar GSrCPs defined by an abundance of glycine, alanine, serine and threonine residues. Right, aliphatic LrCPs defined by an abundance of valine, isoleucine and leucine residues and higher aliphatic index values. Bottom right, acid hydrolysis of whole cement from microspheres yields an abundance of glycine, alanine, serine and threonine in addition to hydrophobic residues. Asterisks indicate residues that are abundant in individual components.

GSrCPs Share Conserved Primary Structure with Archetypal Cement Proteins

The Aa19 family shares considerable primary structure homology with other known cement proteins as revealed by multiple sequence alignment analysis. A 166 residue domain from AaCP19 is highly conserved among Aa19-2, -3, -4 and -5 proteins. While Aa19-3, -4, and -5 each contain two repeated blocks, Aa19-2 is comprised of three repeated blocks of AaCP19. Multiple alignment of these 10 domains is shown in Fig. 4d, where 102 residues out of 170 (60%) are conserved among at least half of the protein sequence. Residues present at 10% or above (Ala/Gly/Ser/Thr) match the distinct amino acids found in the GSrCPs, with the addition of Val. The high number of alignments among the Aa19 family reveals that low complexity regions are conserved, marked by black lines in Fig. 4b,d, consisting of small and uncharged side-chains. Inspection of the GSrCP sequence Aa43-1 yields a triple-repeat segment of ca. 110 residues, with 63 conserved (58%, Fig. 4a,b). In both protein families, each domain maintains a pattern of low complexity, further flanked by regions with highly charged residues (e.g. Lys, Arg, Glu, Asp, Gln, and Asn).

Figure 4
figure 4

Sequence homology of Aa43 and Aa19 GSrCPs.

(a) Primary structure of Aa43-1 showing three homologous domains spanning ca. 100 residues each. (b) Multiple sequence alignment of three low complexity domains in Aa43-1 showing conservation of 60% residues with shared chemistry, (c) primary structure of four GSrCPs with repeated domains of high homology to AaCP19 (d) Multiple sequence alignment of all 19-homologous domains, where conserved regions are composed mainly of low complexity residues. Black lines highlight regions of low complexity.

Cement contains Multiple Oxidases and Proteases

22 proteins coded by our transcript ID database share significant homology to sequenced proteins in the non-redundant NCBI database (nrNCBI). These proteins are summarized in Table 2, where seven are in the oxoreductase family and include three lysyl oxidases (AaLox-1, -2, -3) and three peroxinectins (AaPxt-1, -2, -3) and one peroxidase (AaPx-1). The A. amphitrite Lox is closely related to a homolog found in Drosophila melanogaster (Dmloxl-2, CAB99481), containing two cysteine-rich scavenger domains as well as the lysine tyrosylquinone co-factor linkage, copper binding site and conserved cytokine receptor-like region39 (Supplementary Figure S5). AaPxt-1 and -2 found are homologous to a heme-peroxidase found in the fly larvae of Hesperophylax occidentalis40 (KM384736, Supplementary Figure S6), with three distal and two proximal heme cavities. Additionally, we find both a serine protease (AaSP) and three homologous protease inhibitors (AaPI-1, -2, -3) in the proteome. Three forms of MULTIFUNCin are identified in A. amphitrite (AaMulti-1, -2, -3), which are cues that promote predation as well as cyprid settlement41.

New Cement Protein Sequences Display Silk-like Signatures

To explore the homology of polar GSrCP sequences with other organisms, we remove compositional bias filtering from BLAST to search nrNCBI using only the native scoring matrix (BLOSUM62). Searches limited to Arthropods (including Crustaceans) by this method yield a considerable number (200+) of additional alignments. While aliphatic LrCPs maintain alignments solely among barnacles as seen in filtered nrNCBI searches, polar GSrCPs share significant homology to large numbers of non-barnacle proteins (summarized in Table 3, full results in Supplementary Table S6). Unfiltered nrNCBI searching corroborates multiple sequence analysis from Fig. 4d, where the highest scoring results for the Aa19 family belong to AaCP19 from A. amphitrite and other barnacle species8. However, certain Aa19 proteins, such as Aa19-2 and -5, exhibit greater similarities with fiber-forming proteins over AaCP19 (Table 3). Aa43-1, on the other hand, exhibits strong alignment with multiple silk protein constructs including fibroins, egg stalks, and gum sericins. In fact, silk-related proteins are common among the highest scoring alignments across a majority of GSrCPs. Proteins appearing with highest frequency are a moth egg stalk silk (ACN87362) as well as the silk gum sericin (AGN03940.1), occupying the top two alignments for Aa43-1, Aa19-2 and Aa19-5 as well as other proteins coded from the transcript database (comp27593_c0_seq1, comp27343_c0_seq1, comp48220_c0_seq1). Interestingly, some polar GSrCPs include significant homologies to portions of spider silk sequences: Aa19-2 with orb weaver dragline silk (AAL32375.1) and minor ampullate spidroins, while Aa19-3 aligns with a pyriform spidroin (ADK92884.1) used to adhere dragline silks to solid surfaces. Alignment of silk protein sequences to GSrCPs visualized by dot plotting (Fig. 5a) reveals that silk homology is confined into short domains, defining a distinct primary structure in this class of cement proteins (Fig. 5b). Regions between silk homologous domains display a high number of basic residues rich in arginine and lysine, highlighted in Fig. 5b as ‘complex’.

Table 3 Search results for GSrCPs against unfiltered nrNCBI, showing highest alignment with silk and cement proteins.
Figure 5
figure 5

Shared primary structure of polar GSrCPs.

(a) Recursive dot plot analysis of GSrCP-Aa43-1, GSrCP-Aa19-3, and LrCP-Aa52-1 against archetypal silk proteins, highlighting regions of low complexity along primary sequences. (i) AaCP19, (ii) Egg Stalk Silk from M. signata, (iii) Heavy Chain Fibroin from B. mori, (iv) Heavy Chain Fibroin from R. fugax, (v) Sericin I from B. mori, (vi) Spidroin I from N. clavipes. Dots represent silk homologous domains (SHDs). (b) Distinct alternating primary structure observed in GSrCPs defined by alternating silk homologous and complex domains. (c) Pairwise alignment of archetypal silk proteins to representative SHDs, demarked by dot plotting, where bold letters are identical to the query sequence and italicized bold letters are chemically similar. (d) Stringency of primary sequence in GSrCP low complexity regions measured by pairwise alignment to de novo silk-like motifs, showing higher stringency with [SS] based models over [GG]. Percentages are the number of similar and identical alignments between pairs, divided by cement protein sequence length.

Since GSrCPs favor alignment with sequences that contain repetitive -SS- or S-X motifs, we ask whether the observed low complexity requires a degree of order. Pairwise alignments between GSrCPs and both natural (fibroin, elastin, collagen) and de novo fiber-forming motifs (-GS-, -SSGG-, -SSG-, -SSSSG-, -GGS-, -GGGGS-, -GA-, -GGAA-, -GGA-, -SA-, -SSAA-, -SSA-) spanning the length of cement proteins show a clear preference for motifs containing -SSXX-. For certain GSrCPs (Aa19-2 and -3), di-ser sequences are highly favored over di-gly and di-ala motifs. Previous alignments with the nrNCBI protein database are supported by this analysis, where lacewing stalk silk rich in -SSSS- motifs are preferred over the alternating motif of fibroin from B. mori (-GSGAGA-). In contrast, alignments are weak with other well known proteinaceous fibers that contain gly-based motifs such as elastin and collagen. G-X and S-X dipeptide motifs define either crystalline sheet or amorphous coil regions of fibroin-based silk fibers depending on the polymorphism of X42,43. G-X and S-X content in GSrCPs are loosely conserved compared to silks42,44,45,46, typically polymorphic between Ala/Ser and to a lesser extent Arg/Lys/Asx/Glx, and does not seem to obey the Pauling-Corey anti-parallel model for fibroin45. Glycine percentages in these domains remain below 40%, also indicating they have a propensity for fibril formation but not for the elastic silk-like properties47.

Discussion

Integrated proteomic and transcriptomic analysis reveals that nanofibrillar cement in barnacle adhesive is comprised primarily of a new and unique family of polar proteins. These proteins are expressed specifically in the adhesive plaque; they were identified using mRNA sequences derived from sub-mantle tissues where cement glands and other cement proteins are found34. GSrCPs were not found in translated mRNA libraries assembled from other regions of the organism (main body, side plates48), or in MS/MS analysis of fluid collected from canals that line the side plates. Furthermore, both filtered and unfiltered searches against all arthropod nrNCBI sequences yield no homologies to known cuticle proteins, which are in intimate contact with the cement13.

Fibers are the dominant ultrastructure in barnacle adhesive. Cement collections display a matted, nanofibrillar ultrastructure12,13,49,50,51, that forms in both sealed and porous attachment conditions, and is readily dissolved by polar solvents used to solubilize biomaterials rich in hydrogen bonds and β-sheet structures52,53,54. Relative abundance of proteins in MS/MS data confirms PAGE observations, where Aa43 and polar GSrCPs occupy a significant fraction of cement. In fact, whole solubilized samples consist of 30–50% identifiable peptides that belong to GSrCPs, with protein abundance index (emPAI) values 5–10 times greater than those of aliphatic LrCP components thought to comprise bulk cement (Table 1). Studies of cement from other species show that homologous proteins are released after exposure to aqueous denaturants, but not in abundance as seen when using HFIP17,18,19. Our findings collectively demonstrate that GSrCPs are a major component of nanofibril ultrastructures, held together by dense hydrogen bonding and released by highly polar solvents.

Many primary sequences throughout the polar GSrCPs are conserved, typically rich in glycine, serine, threonine and alanine residues. These proteins are generally unrelated to previous cement components by composition and sequence, however many contain repeated blocks aligned with a well-studied 19 kDa cement protein21. Thus, low complexity regions are widely found throughout cement, defining new and chemically unique sequences while relating them to previously identified cement proteins. Low complexity is commonly associated with fibrous materials, ranging from ordered structures (i.e. silks, elastin, and keratin) to disordered domains that become pathological amyloids55. In amyloid formation, glycine-rich domains organize into liquid droplets, a precursor state for long range beta sheet formation55. In cement, silk-homologous proteins encompass the largest cohort in our proteome; 22 sequences maintain e-values with silks ranging from 10−10–10−33 while they occupy ca. 30% of all identifiable peptides in cement, where most align to a specific egg stalk silk. Since GSrCPs comprise a large portion of a fibrous material with amyloid-like secondary structure12,14, we believe homology with arthropod silks to be biologically significant. The highly varied compositions of other proteins in the barnacle cement proteome such as LrCPs, SIPCs and other miscellaneous enzymes indicate that alignments are not from a bias in overall barnacle protein composition.

Aquatic arthropods such as caddisfly larvae2,56 and certain amphipods6,7 have recently been found to employ silks as mortar to construct their protective housing underwater. Caddisfly (e.g., H. consimilus and H. occidentalis) belong to the more evolutionary distant insects while amphipods (e.g., C. bonellii) are closer neighbors, divergent from barnacles by four ancestors57. These findings underscore the evidence that crustaceans can share adhesive traits with distantly related insects. At least one GSrCP was found homologous to multiple pyriform spidroins, a nanofibrillar cement material used by orb-weaving spiders to attach dragline silk to a solid substrate58. Unlike spun silks, dragline cement is secreted as a viscous fluid that cures over time58,59, a process more in line with observations of barnacle adhesive viscosity60,61 and interface development49,62. Barnacles do not appear to draw fibers from ductwork to deliver adhesion chemistries, rather, the adhesive is more likely shaped through spontaneous protein folding similar to amyloid formation. Additionally, many GSrCPs align with sericin63, a sticky silk gum protein that binds fibers together and to various surfaces. Barnacle adhesives may have evolved homologous properties with these silk-associated cements, which are examples of the first materials identified with shared sequence, ultrastructure and function.

Finally, our results show that barnacle adhesives likely undergo chemical processing subsequent to their self-assembly. Cross-referencing the cement proteome with nrNCBI reveals multiple enzymes within the adhesive collections. Of particular interest are peroxinectins, which have been implicated in catalyzing oxidative crosslinking of caddisworm silk64 and lysyl oxidases, which acts to modify lysine side chains and participates in crosslinking collagen, elastin fibrils as well as cuticular tissues39. The high number of lysines in exposed complex domains of GSrCPs suggests that they would be available for cross-linking. Phosphorylation of the abundant serine and threonine residues in GSrCPs presents another possible modification, as has been identified in many aquatic2,65 adhesives including secretions from cyprids and adult barnacles66,67. Indeed, kinases were identified in transcriptome sequencing of the cement gland region34. However, our analysis of the solubilized cement did not reveal these entries or any other kinases. Previous infrared spectroscopy of the adhesive from underneath live barnacles and reflected from the top of demineralized cement plaques have shown little evidence for organophosphate bonds, although x-ray photoelectron spectroscopy of newly deposited adhesive revealed the presence of a small amount of phosphate13,30. Abundant basic residues found throughout the cement proteome (40% of the proteins with average pI above 10) could serve to aid barnacle adhesion; for example, lysines and arginines have recently been found essential in displacing saltwater cations to allow direct interaction with surface oxides16,68. The identification of relationships to other silk-producing arthropods and functional molecular chemistries offers a first glimpse into how barnacle cement is constructed and functions. Future work includes identifying what role the enzymes may play, the extent of post-translational modifications to GSrCPs, and how these materials fit into the curing mechanism of the cement.

Methods

Informatics analysis of adhesive proteome

Informatics software used in this work: Gepard 1.4 with a BLOSUM62 scoring matrix (University of Vienna, Austria) using a window of 9 for dot plot analysis (displaying equal regions of the histogram), Cluster 3.0 (University of Tokyo, Japan) for hierarchical clustering of e-values (Distances measured by uncentered Pearson correlation, clustered using mean linkages), TreeView 3.0 (Stanford, USA) to generate 2D correlated e-values in Fig. 2d and Table 1. BLASTP (NIH, USA) 2.2.32+ was used for annotation of proteome entries, using a BLOSUM62 scoring matrix with gap penalties as 11 for existence and 1 for extension. RADAR (EMBL-EBI, UK) was used for self sequence alignment, Clustal Omega 1.2.1 (EMBL-EBI, UK) was used for multiple sequence alignment and EMBOSS Needle (EMBL-EBI, UK) with gap open penalty of 50 was used for pairwise alignment of silk-like segments identified by dot plots from Fig. 5. EMBOSS Water (EMBL-EBI, UK) with gap open penalty of 20 was used to find local alignments with de novo silk-like motif sequences to keep percentage of gaps below 5%. Boxshade 3.21 used to display alignment files with a 0.5 threshold of homology or identity for shading. IBS 1.0 (Sun Yat-sen University, China) was used to generate protein schema.

For sequence assignment of MS/MS data, assembled cDNA generated from RNA-seq performed previously34 were translated into a FASTA database (1044910 total entries) in six open reading frames with the EMBOSS transeq command (EMBL-EBI, UK). All identified transcript sequences from PAGE samples are then combined into a subset FASTA database and uploaded to BLASTP for pairwise alignment and e-value analysis. The natural log of all pairwise e-values are then submitted to Cluster 3.0 and plotted using a two-tone look up table with Treeview. Clusters with 10−4 or lower e-values are identified as a family and entered into Table 1. Grand average of hydropathicity (GRAVY) values were established using the system defined by Kyte and Doolittle69. Aliphatic index values were determined using methods defined by Ikai70. GRAVY and Aliphatic index values were calculated using ProtParam (SIB, Switzerland) software.

To find all possible cement proteins, the full transcriptome was searched for additional entries using the two domains identified in the 43-like and 19-like protein families using a hidden Markov search. Two additional sequences were found belonging to the 43-like family and one with homology to AaCP19 were found, making a total of six 43-like proteins and nine 19-like proteins. When this motif was used to search the broader NCBI database, only the previously identified 19 kDa proteins were retrieved.

Barnacle Husbandry

A. amphitrite cyprids were settled on silicone-coated glass panels and reared at the Duke University Marine Laboratory as previously described71. Panels with adult barnacles were shipped to the Naval Research Laboratory (NRL) where they were maintained in an incubator operating at 23 °C with 12 h day/night cycles in 32 ppt artificial seawater (Instant Ocean, Blacksburg, VA). The barnacles were fed Artemia spp. nauplii (Brine Shrimp Direct, Ogden, UT) three times a week and the artificial seawater was changed once a week during which the algal growth was removed. Barnacles to be used for experiments were gently dislodged from the silicone-coated glass panels13, rinsed with distilled water, and placed on alternate substrates for the experiments.

Cement Collection on Glass Microspheres

Barnacles were placed onto a bed of soda lime glass microspheres (48–85 μm) (Cospheric, Santa Barbara, CA) in ASW forming a bed of microspheres 2–3 mm in depth. After one week, barnacles were lifted off, and microspheres associated with their underside were gently transferred without damaging the barnacle. The microspheres were pooled from multiple animals (n = 6), rinsed 3 times in fresh D.I. water, and stored at 4–8 °C until use. Microspheres untouched by barnacles from the same dish were also collected and rinsed for background measurements.

Collection of Opaque Adhesive

Adult barnacles that develop a thick white opaque adhesive (a.k.a., gummy glue) were gently removed from silicone panels and the ‘opaque’ glue was shaved or peeled off the baseplate without damaging the barnacle using an angled razor blade. ‘Opaque’ glue pieces were rinsed with D.I. water, pooled from multiple animals (n = 3) and placed in enough hexafluoroisopropanol (HFIP) to cover the pieces.

Collection from Rinsed Plaques

Rinsed plaques were collected using previously developed methods34. Briefly, barnacles were transferred from silicone panels and settled on sodium aluminoborate (Na2O ∙ Al2O3 ∙ 3B2O3) glass substrates, which form a hydrated reaction layer (<25 μm thick) in aqueous environments that is resistant to barnacle adhesion38. After 8 weeks, barnacle bodies and side plates were carefully removed, leaving the base of the barnacles attached to the substrates which were then cleaned with a cotton swab in deionized water to remove loosely bound organic matter. Base plates were then demineralized by immersing substrates in 0.1 M ethylenediaminetetraacetic acid (EDTA) at room temperature for 48–72 h. Barnacle “medallions,” consisting of the cuticular layer and underlying barnacle secretions, were gently rinsed with deionized water then peeled off the aluminoborate glass substrates and pooled (n = 3) in 50 μL of Laemli sample buffer containing 300 mM dithiothritol (DTT).

SDS-PAGE

‘Microsphere’ and ‘opaque’ samples were prepared by adding two bed volumes of HFIP to the microspheres and opaque glue collections and then sonicated in a water bath for 1 hour at room temperature. The HFIP was removed and evaporated to dryness by vacuum centrifuge. Next, for all three sample types, 25 μL of Laemmli sample buffer containing 300 mM DTT and 25 μL of D.I. water were added and incubated for 15 min at 95 °C.

The samples were separated by SDS-PAGE at 200 V constant on Any kD Mini-PROTEAN TGX precast gels (Bio-Rad, Hercules, CA) using Tris-SDS running buffer (25 mM Tris, 192 mM glycine, and 0.1% SDS, pH 8.3). Gels were then stained with either Bio-Safe Coomassie Stain (Bio-Rad, Hercules, CA) or Imperial protein stain (Thermo Fisher Scientific) for visualization.

Amino Acid Analysis

Microsphere cement samples were subjected to acid hydrolysis using 6N aqueous HCl in a vacuum hydrolysis tube heated to 150 °C for 1.5 hours. Phenol was added to suppress the halogenation of tyrosine residues and thioglycolic acid to suppress cysteine oxidation. For quantification of amino acids, hydrolyzed samples were transferred to 6 mm wide borosilicate culture tubes, dried down by vacuum centrifugation and resuspended in 2:2:1 volumes of EtOH:H2O:Triethylamine (TEA). Samples were dried again and derivitized with phenylisothiocyanate (PITC) in a 7:1:1:1 solution (EtOH:H2O:PITC:TEA) for 30 mins at RT and dried. Analytical HPLC was carried out on an Agilent Infinity 1260 at 300 μL/min using a C18 4.6 × 150 mm Poroshell 120 column (2.7 μm bead-size, Agilent) heated to 38 °C, monitoring eluent absorbance at 250 nm. Solvent A was a 30 mM sodium acetate (pH 5.5) solution; Solvent B was 18 mM sodium acetate in a 70/30 acetonitrile/water mixture (pH 3.8), using an elution gradient by Smith et al.72. Elution times for each amino acid were established by first running PITC derivitized amino acid Standard H (Pierce Scientific).

Amino acid analysis of isolated 63 kDa PAGE band was performed by dissolving roughly 1 mg (hydrated mass) of ‘opaque’ glue in HFIP, dried by centri-vac, resuspended in sample buffer, and running 100 μL onto a 7 cm single lane gradient gel (Any kD TGX, Bio-Rad, Hercules, CA). Gel was stained using bio-safe coomassie (Bio-Rad, Hercules, CA) and immediately transferred to a 7 × 8.4 cm PVDF membrane (Sequi-Blot, Bio-Rad, Hercules, CA) using 10 mM CAPS running buffer (10% methanol, pH 11.0, vacuum degassed and filtered) under constant 100 mA current for 1.5 hrs. A bright band at 63 kDa was cropped by razor blade and sent to Bio-Synthesis Inc. (Lewisville, TX) for amino acid analysis. PVDF bands were hydrolyzed in 6N HCl for 24 hrs at 110 °C. Hydrolysate was dried and derivitized pre-column using 200 uL of aqueous 6-aminoquinolyl-N-hydroxysuccinimidyl carbamate (AQC) (AccQ-Tag, Waters) solution. 10 uL were injected onto a Waters Breeze 2 HPLC, where amino acid fluorescence and abundance were quantified by a Waters 2475 multi-λ fluorescence detector. Data shown in Supplementary Figure S3.

PCR of GSrCP-Aa43-1

Total RNA from sub-mantle tissues was isolated as described previously34 and reverse transcribed by PCR to cDNA according to the manufacturer’s protocol (Life Technologies). The first strand cDNA was then used as a template along with the primers, AaCP43 F1 :

(5′ ATGCTGCCTGCCGCGATCC) and AaCP43 R1:

(5′ CTACCATTTAGGCTTATAC), to amplify the GSrCP-Aa43-1 transcripts by conducting PCR with the following steps: 95 °C for 30 sec, 56 °C for 1 min and 30 sec, 70 °C for 1 min and 30 sec, for 35 cycles. Resulting 1344 bp DNA fragments were isolated and then sequenced by using Sanger sequencing method (Eurofins MWG Operon; Louisville, KY). The assembled sequence was then aligned with the transcript comp41238_c0_seq1 using Clustal Omega (EMBL-EBI, UK). All RT-PCR related reagents and enzymes were purchased from Life Technologies (Carlsbad, CA) and reagents for DNA isolation were from Qiagen Inc (Valencia, CA).

Tandem Mass Spectrometry and Sequence Assignment

Samples analyzed at NRL were processed as individual bands from protein extracts of each sample separated by SDS-PAGE, excised and digested in gel by trypsin. Peptides were extracted by 2% formic acid in 50/50 acetonitrile/water, followed by 100% acetonitrile. Digests were analyzed by liquid chromatography mass spectrometry/mass spectrometry (LC-MS/MS) using a Tempo-MDLC coupled to a TripleTOF 5600 mass spectrometer (AB Sciex, Foster City, CA). Tandem mass spectra were extracted by AB Sciex MS data convertor version 2. Aggregate samples were analyzed at the Texas A&M Protein Chemistry Lab (TAMU, College Station, TX) where protein extracts were run on SDS-PAGE for 10 minutes (see Fig. 1b), divided into four segments, digested by trypsin and extracted. Digests were analyzed by LC-MS/MS using a NanoLC 2-D (Eksigent, Dublin, CA) coupled to a LTQ Orbitrap Velos H/ETD (Thermo Scientific, Waltham, MA). Tandem mass spectra were extracted by Mascot Distiller (Matrix Science, London, UK) software. Two 62 kDa bands from ‘opaque’ and ‘microsphere’ cement samples were sent to BioProximity (BioP, Chantilly, VA) for multiple enzyme digestion and sequencing. Bands were cut into six pieces, where each piece was digested with a single enzyme: trypsin, chymotrypsin, Glu-C, alpha-lytic protease, pepsin and thermolysin. The six digestion products were then combined and injected for LC-MS/MS analysis. Digests were analyzed by LC-MS/MS using an Easy-nLC 1000 coupled to a Q Exactive Quadrupole-Orbitrap (ThermoFisher, Waltham, MA). Charge state deconvolution and deisotoping were not performed. All MS/MS samples were analyzed using Mascot (Matrix Science, London, UK; version 2.4.1) and X! Tandem (The GPM, thegpm.org; version CYCLONE (2010.12.01.1)). To translate assembled cDNA sequences generated from RNA-seq experiments34 into a searchable FASTA database, EMBOSS transeq command was used with 6 open reading frames. Mascot was set up to search the BarnALL_001 database (1045268 entries) assuming the digestion enzyme trypsin. X! Tandem was set up to search a subset of the BarnALL_001 database also assuming trypsin. Mascot and X! Tandem were searched with a fragment ion mass tolerance of 0.60 Da and a parent ion tolerance of 20 PPM for aggregate samples analyzed at TAMU. Samples analyzed at Bioproximity were searched with a fragment tolerance of 20 ppm and a parent ion tolerance of 0.8 Da. Samples analyzed at NRL were searched with a fragment tolerance of 0.2 Da and a parent ion tolerance of 0.2 Da. Deamidation of asparagine and glutamine, oxidation of methionine, acetylation of the n-terminus, and carbamidomethylation of cysteine were specified in Mascot as variable modifications. Glu- > pyro-Glu of the n-terminus, ammonia-loss of the n-terminus, gln- > pyro-Glu of the n-terminus, deamidation of asparagine and glutamine, oxidation of methionine, acetylation of the n-terminus, and carbamidomethylation of cysteine were specified in X! Tandem as variable modifications. Scaffold (version Scaffold_4.6.1, Proteome Software Inc., Portland, OR) was used to validate MS/MS based peptide and protein identifications. Peptide identifications were accepted if they could be established at greater than 80.0% probability by the Peptide Prophet algorithm73 with Scaffold delta-mass correction. Protein identifications were accepted if they could be established at greater than 95.0% probability and contained at least 4 identified peptides. Venn diagram in Fig. 2a was produced using a minimum of 4 identified peptides. Protein probabilities were assigned by the Protein Prophet algorithm74. Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony.

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE75 partner repository with the dataset identifier PXD004293 and 10.6019/PXD004293.

Additional Information

How to cite this article: So, C. R. et al. Sequence basis of Barnacle Cement Nanostructure is Defined by Proteins with Silk Homology. Sci. Rep. 6, 36219; doi: 10.1038/srep36219 (2016).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.