The Nephila clavipes genome highlights the diversity of spider silk genes and their complex expression

Journal name:
Nature Genetics
Volume:
49,
Pages:
895–903
Year published:
DOI:
doi:10.1038/ng.3852
Received
Accepted
Published online

Abstract

Spider silks are the toughest known biological materials, yet are lightweight and virtually invisible to the human immune system, and they thus have revolutionary potential for medicine and industry. Spider silks are largely composed of spidroins, a unique family of structural proteins. To investigate spidroin genes systematically, we constructed the first genome of an orb-weaving spider: the golden orb-weaver (Nephila clavipes), which builds large webs using an extensive repertoire of silks with diverse physical properties. We cataloged 28 Nephila spidroins, representing all known orb-weaver spidroin types, and identified 394 repeated coding motif variants and higher-order repetitive cassette structures unique to specific spidroins. Characterization of spidroin expression in distinct silk gland types indicates that glands can express multiple spidroin types. We find evidence of an alternatively spliced spidroin, a spidroin expressed only in venom glands, evolutionary mechanisms for spidroin diversification, and non-spidroin genes with expression patterns that suggest roles in silk production.

At a glance

Figures

  1. A catalog of spidroin genes from the golden orb-weaver spider.
    Figure 1: A catalog of spidroin genes from the golden orb-weaver spider.

    Phylogenetic tree showing the evolutionary relationship among the assembled N. clavipes spidroins using N-terminal sequences (~130 residues) from each putative gene product (bootstrap values provided in Supplementary Fig. 3a). Circular symbols denote the silk class—with putative functional application and presumed gland of origin (Supplementary Fig. 1b,c)—of each spidroin, determined by alignment to known spidroin sequences (spidroins that did not cluster are designated “unknown”). Genic structures are drawn to scale. N-terminal domains are colored green, and C-terminal domains are colored red. The illustration of the arrays of repeated motifs found within the internal coding regions is simplified, symbolized by alternating light and dark gray bars. Non-repetitive coding 'spacer' sequences (pink), scaffold gaps (black bars), 'linked' scaffolds validated by long-range PCR (string of “NNNNN”), and flanking noncoding or intronic sequences (thin lines and arrows) are also shown.

  2. The frequency and distribution of repetitive motifs found across N. clavipes spidroin genes.
    Figure 2: The frequency and distribution of repetitive motifs found across N. clavipes spidroin genes.

    (a) Summary of the 394 distinct repetitive motifs found in N. clavipes spidroins, grouped by amino acid sequence. The most frequently observed motifs (≥30 occurrences) are listed here; “X” indicates a variable amino acid position. Our motif catalog includes known motifs (dark gray), new variants of known groups (light blue), and novel motifs not previously described in the literature (gold). Motifs that are repeated less frequently (<30 occurrences) or cannot be informatively grouped are designated “additional” and “unassigned” (purple), respectively. The asterisks indicate a diverse group that cannot be informatively exemplified. The complete list of N. clavipes motifs is provided in Supplementary Table 13. Circular symbols indicate the spidroin classes in which each motif group is observed. (b) Bar graph showing the extent of repetitive motif coverage in the structure of each spidroin. Beside each bar, we provide the percentage of the internal coding region composed of repetitive motifs calculated for fully assembled spidroins (NC, not calculated).

  3. Repetitive motifs are extensively shared across spidroins.
    Figure 3: Repetitive motifs are extensively shared across spidroins.

    (a) Bar graph comparing the number of shared (gold) versus private (dark gray) distinct repetitive motifs observed in each spidroin. (bg) Circos plots illustrate sharing of motif sequences among N. clavipes spidroins, specifically showing sharing of sequences belonging to known motif groups (b), known motif groups with novel N. clavipes variants (c), novel motif groups (d), and unassigned motifs (e). In these plots, genes are arrayed around the circle, and links are drawn to connect similar motif sequences that occur in both genes. (f) Circos plot showing the extensive sharing of motifs between novel FLAG-b (VeSp) and the other N. clavipes spidroins, supporting classification of FLAG-b as a spidroin. (g) The Circos plot of FLAG-a shows a similar distribution of motif sharing as seen for FLAG-b and further highlights the lack of motif sharing of either gene with AgSp-d.

  4. The frequency and distribution of cassettes found across N. clavipes spidroin genes.
    Figure 4: The frequency and distribution of cassettes found across N. clavipes spidroin genes.

    (a) Summary of 506 distinct higher-order repetitive structures (cassettes) found in N. clavipes spidroins. Cassettes are grouped according to the motif types from which they are composed. For cassettes composed of more than one distinct motif, the amino acids belonging to each motif are specified by color (red or black). The asterisk indicates cassettes that are too large to display here or that cannot be informatively exemplified. A list of all N. clavipes cassettes is provided in Supplementary Table 14. Circular symbols indicate the spidroin classes in which each cassette group is observed. (b) Bar graph comparing the number of shared (gold) versus private (dark gray) distinct repetitive cassette types observed in each spidroin. Beside each bar, we provide the percentage of repetitive motifs composed of cassettes calculated for fully assembled spidroins (NC, not calculated).

  5. Spidroin gene expression in N. clavipes.
    Figure 5: Spidroin gene expression in N. clavipes.

    (a) Box-and-whiskers plots showing the relative expression of four N. clavipes spidroin loci in individual tissue dissections (n = 3 biological replicates per tissue) assayed by qPCR. Tissues including legs, venom glands, five anatomically distinct silk glands, and 'other' silk glands (aciniform and piriform glands attached to spinneret) are shown on the x axis, and expression (2−ΔΔCT method) is depicted on the y axis (log10 scale). Box-and-whiskers plots show the range of expression values of the given spidroin gene (left y axis) relative to RPL13a (housekeeping gene) expression and normalized to leg tissue. Thick black center lines represent median values. Upper whiskers represent largest observation ≤ upper quartile (Q3) + 1.5 interquartile range (IQR), and lower whiskers represent smallest observation ≥ lower quartile (Q1) −1.5(IQR). Red asterisks mark silk glands with significantly greater expression of a given gene over leg tissue, whereas black asterisks indicate a single silk gland type exhibiting significantly greater expression values of a given gene than all other silk gland types together (one-tailed Wilcoxon rank-sum tests): *P < 0.05, **P < 0.01. (b) Heat map showing patterns of co-expression among N. clavipes spidroins as assayed by qPCR across venom and silk glands. Co-expression scores were calculated using Pearson correlation of relative expression values (2−ΔΔCT) for each pair of genes and plotted using single-linkage hierarchical clustering. Owing to sequence similarity between MaSp-b and MaSp-c the data for these two transcripts are presented together as “MaSp-b,c”. (c) Box-and-whiskers plot showing the expression of three genes (FLAG-a, FLAG-b (VeSp), and PR-1), assayed using qPCR in flagelliform silk glands (left three boxes) and venom glands (right three boxes) collected from three mature N. clavipes females (n = 3 tissue samples for each type). Significantly greater expression of FLAG-a was detected in flagelliform silk glands, whereas significantly greater expression of PR-1 (a venom-specific toxin gene used as a control) and FLAG-b was detected in the venom glands. Box and whiskers show the range of relative expression values (calculated using the 2−ΔΔCT method) of each of the three genes for both tissue types relative to RPL13a (a housekeeping gene) expression in each tissue and normalized to leg tissue. Thick black center lines represent median expression values. Upper whiskers represent largest observation ≤ upper quartile (Q3) +1.5 interquartile range (IQR), and lower whiskers represent smallest observation ≥ lower quartile (Q1) − 1.5(IQR). Black asterisks indicate that a given gene exhibits significantly greater (FLAG-a) or lower (FLAG-b, PR-1) expression in flagelliform silk glands versus venom glands (n = 3 samples per gland, one-tailed Wilcoxon rank-sum test): *P = 0.05. (d) Evidence of alternative spliceoforms of MaSp-f. Junction reads mapping to two distinct isoforms were detected. Junction reads mapping to MaSp-f isoform 2 were observed in silk 1 and silk 2 gland isolates, as indicated by the heat map beneath the isoform cartoon.

  6. The golden orb-weaver spider/'s morphology, reported silk gland anatomy, and web construction.
    Supplementary Fig. 1: The golden orb-weaver spider’s morphology, reported silk gland anatomy, and web construction.

    (a) Photographs of N. clavipes showing an adult female at the center hub of her orb web (left) and a view of the spinneret silk-extruding organs on the underside of the female abdomen (right). (b) Silk gland anatomy of N. clavipes, showing the seven different female araneoid gland morphologies found in the abdomen and the different classes of silk proteins produced. Each silk class has specific physical characteristics; for example, the minor and major ampullate spidroins produce silks with great tensile strength, flagelliform silk has great extensibility, aggregate silks are non-fibrous stick glue, etc. This illustration (inspired by ref. 52) exhibits one set of silk glands and spinnerets from a bilateral pair, and indicates that each gland type produces a specific type of silk. However our expression data (Fig. 5a) suggest that this is not the case, supporting previous findings48, 50, 53 that individual glands can express multiple classes of spidroins. Note: the gland type coloration scheme and corresponding silk use pictograms defined here are used in later figures. (c) Putative applications of spider silk types in web construction (web diagram adapted from ref. 54), as described in previous studies. (i) Web building and maintenance: major ampullate silk is used for bridgelines and web radii; minor ampullate silk is used for temporary spiral; piriform attaches fibers together and to substrates; flagelliform is used for the capture spiral; aggregate silks are sticky, aiding in adherence and prey capture. (ii) Prey wrapping: aciniform (top inset photo). (iii) Silk egg casings: tubuliform (bottom inset photo). References for silk classes and their purported uses are listed in the main text. (Photos provided by P.L.B.)

  7. Maximum-likelihood phylogenetic gene tree of 28 N. clavipes spidroins in the context of 55 spidroins from other spider taxa.
    Supplementary Fig. 2: Maximum-likelihood phylogenetic gene tree of 28 N. clavipes spidroins in the context of 55 spidroins from other spider taxa.

    The spidroin gene tree is rooted with a Bothriocyrtum californicum fibroin sequence (B.c. fibroin1; accession HM752562) and is based on multiple-sequence alignment (MSA) using the first ~130 amino acid residues of the N-terminal domain encoded by each gene. MSA was performed with Geneious, Clustal, and BLOSUM62, and the consensus tree was built with PhyML (Supplementary Note). Bootstrap proportions >50 (based on 1,000 replicates) are shown to the left of their respective nodes. Colors follow the gland/spidroin class designations shown in Supplementary Figure 1. Codes and accession numbers for different spidroins and taxa are listed in Supplementary Table 10.

  8. Maximum-likelihood phylogenetic gene trees for the catalog of 28 spidroins identified in N. clavipes.
    Supplementary Fig. 3: Maximum-likelihood phylogenetic gene trees for the catalog of 28 spidroins identified in N. clavipes.

    (a,b) Unrooted maximum-likelihood phylogenic trees for the catalog of 28 spidroins identified in N. clavipes, shown as both transformed (a) and non-transformed (b) layouts. Both trees are based on a multiple-sequence alignment (MSA) using the first ~130 amino acid residues of the N-terminal domain for each N. clavipes spidroin. MSA was performed with Geneious, Clustal, and BLOSUM62, and the consensus tree was built with PhyML (Supplementary Note). Bootstrap proportions >50 (based on 1,000 replicates) are shown to the left of their respective nodes. Colors follow the gland/spidroin class designations shown in Supplementary Figure 1. Codes for N. clavipes spidroins are listed in Supplementary Table 12.

  9. Agarose gel images of long-range PCR-amplified MiSp sequences used for validation of draft assembly, scaffold bridging, and gap closure.
    Supplementary Fig. 4: Agarose gel images of long-range PCR–amplified MiSp sequences used for validation of draft assembly, scaffold bridging, and gap closure.

    The top panel highlights a single lane with an LR-PCR reaction (golden rectangle) for MiSp-c. The bottom panel highlights four lanes with LR-PCR reactions (golden rectangle) for MiSp-d. In both cases, multiple large bands are visible, indicating amplification of multiple targets that presumably represent genomic regions with high sequence similarity to the binding sites of the oligonucleotide primers used to isolate both MiSp types.

  10. Distribution of amino acid frequency for N. clavipes /`gold/' gene models.
    Supplementary Fig. 5: Distribution of amino acid frequency for N. clavipes ‘gold’ gene models.

    Amino acid frequency distributions were calculated for all 20 amino acids for all mRNA transcripts from the gold gene model set (n = 17,989 mRNA sequences). Several spidroins were found at the extreme ends of the individual amino acid distributions (Supplementary Fig. 5 and Supplementary Note). Box-and-whisker plots show the range of frequency values (y axis) for the given amino acid residues (x axis). Thick black center lines represent median values. Upper whiskers represent the largest observation ≤ the upper quartile (Q3) + 1.5 interquartile range (IQR), and lower whiskers represent the smallest observation ≥ the lower quartile (Q1) – 1.5 IQR.

  11. Distribution of amino acid frequency for 28 N. clavipes spidroin genes.
    Supplementary Fig. 6: Distribution of amino acid frequency for 28 N. clavipes spidroin genes.

    Amino acid frequency distributions were calculated for all 20 amino acids for all N. clavipes spidroin genes (n = 28 sequences). Box-and-whisker plots show the range of frequency values (y axis) for the given amino acid residues (x axis). Thick black center lines represent median values. Upper whiskers represent the largest observation ≤ the upper quartile (Q3) + 1.5 interquartile range (IQR), and the lower whiskers represent the smallest observation ≥ the lower quartile (Q1) – 1.5 IQR. Overall, spidroins exhibit enrichment of alanine, glycine, and serine residues, which have significantly different proportions when compared to 17,989 mRNA sequences from the gold gene set (Wilcoxon rank-sum test; Supplementary Fig. 5 and Supplementary Note). **P < 0.01.

  12. Shared and private motif occurrences in N. clavipes spidroins.
    Supplementary Fig. 7: Shared and private motif occurrences in N. clavipes spidroins.

    Bar graph comparing the number of shared (gold) versus private (dark gray) distinct repetitive motif occurrences observed in the different N. clavipes spidroins (n = 28 sequences).

  13. Shared and private cassette occurrences in N. clavipes spidroins.
    Supplementary Fig. 8: Shared and private cassette occurrences in N. clavipes spidroins.

    Bar graph comparing the number of shared (gold) and private (dark gray) distinct repetitive cassette occurrences observed in the different N. clavipes spidroins (n = 28 sequences).

  14. RNA-seq expression patterns of spidroin genes in 13 N. clavipes tissue samples.
    Supplementary Fig. 9: RNA–seq expression patterns of spidroin genes in 13 N. clavipes tissue samples.

    Heat map showing the absolute number of normalized RNA–seq reads that map to spidroin transcripts, assayed in ten individual silk glands, one venom gland isolate, and two brain isolates collected from two non-gravid females, Nep-008 and Nep-009. Owing to extensive sequence similarity between MaSp-b and MaSp-c, it was not possible to distinguish between reads that mapped to these two spidroins; thus, data for these two transcripts are presented together as “MaSp-b,c”. Reads mapping to MaSp-h and AgSp-c exceeded the heat map’s informative range; thus, we have included bar graph insets (right) confirming that reads mapping to MaSp-h (top inset) and AgSp-c (bottom inset) are substantially more abundant in silk glands than in venom gland or brain.

  15. Distributions of relative expression values for 29 N. clavipes genes in seven tissue types.
    Supplementary Fig. 10: Distributions of relative expression values for 29 N. clavipes genes in seven tissue types.

    Box-and-whisker plots of the relative expression for all 28 N. clavipes spidroin genes and 1 venom gene (PR-1) in tissue dissections (n = 3 independent-specimen biological replicates per tissue) assayed by qPCR. Tissues included venom glands, five anatomically distinct silk glands, and ‘other’ silk glands (aciniform and piriform glands attached to the spinneret) and are shown left of the y axis, whereas relative expression (2−ΔΔCT method46) is depicted on the y axis (log10 scale) organized in rows by tissue type. Box-and-whisker plots show the range of expression values for the given genes (x axis) relative to RPL13a (housekeeping gene) expression and normalized to leg tissue. Thick black center lines represent median values. Upper whiskers represent the largest observation ≤ the upper quartile (Q3) + 1.5 interquartile range (IQR), and the lower whiskers represent the smallest observation ≥ the lower quartile (Q1) – 1.5 IQR. Asterisks indicate a single silk gland type exhibiting significantly greater expression values for a given gene versus all other silk gland types together (Wilcoxon rank-sum test). **P < 0.01.

  16. Mean relative expression values of 29 N. clavipes genes in seven tissue types.
    Supplementary Fig. 11: Mean relative expression values of 29 N. clavipes genes in seven tissue types.

    Heat map showing the relative expression of N. clavipes spidroin loci in tissue dissections (n = 3 biological replicates per tissue) assayed by qPCR. Tissues included venom glands, five anatomically distinct silk glands, and ‘other’ silk glands (aciniform and piriform glands attached to the spinneret) and are shown on the x axis, with spidroins arranged on the y axis. The heat map panels depict relative mean fold change in gene expression (2−ΔΔCT method46) per tissue (distinct tissue dissections from n = 3 individuals) over RPL13a and normalized to leg tissue.

  17. RNA-seq expression patterns of SSTs in 13 N. clavipes tissue samples.
    Supplementary Fig. 12: RNA–seq expression patterns of SSTs in 13 N. clavipes tissue samples.

    Heat map showing the absolute number of normalized reads that map to 649 non-spidroin silk gland–specific transcripts (SSTs), assayed in ten individual silk glands, one venom gland, and two brain isolates collected from two non-gravid females, Nep-008 and Nep-009. SSTs are vertically clustered based on the filtering method used for discovery (Supplementary Note), as noted by colored vertical bars at the right of the heat map. The categories defined on the left are described in Supplementary Table 15.

  18. Polymorphism levels of genes and genic features in the N. clavipes genome.
    Supplementary Fig. 13: Polymorphism levels of genes and genic features in the N. clavipes genome.

    (a) Box-and-whisker plot comparing the distribution of θW values43 derived (from SNP counts) for 14,025 gold gene sequences in comparison to the distribution of θW values for 28 N. clavipes spidroins. Box-and-whisker plots show the range of θW values for each gene set. Thick black center lines represent median values. Upper whiskers represent the largest observation ≤ the upper quartile (Q3) + 1.5 interquartile range (IQR), and the lower whiskers represent the smallest observation ≥ the lower quartile (Q1) – 1.5 IQR. Asterisks indicate the 28 N. clavipes spidroin genes that exhibit significantly greater θW values than the collected gold gene set (Wilcoxon rank-sum test; Supplementary Note). **P < 0.01. (b) Vertical bar graph showing the mean θW values for 11 genomic feature categories, including many gold gene model subfeatures, in comparison to the mean θW values for N. clavipes spidroins, silk N termini, and silk C termini. (c) Bar graph depicting the θW values for individual N. clavipes spidroins.

Accession codes

Primary accessions

BioProject

NCBI Reference Sequence

References

  1. Natural History Museum Bern. The World Spider Catalog, version 18.0 http://wsc.nmbe.ch/ (accessed 9 November 2016).
  2. Garrison, N.L. et al. Spider phylogenomics: untangling the Spider Tree of Life. PeerJ 4, e1719 (2016).
  3. Blackledge, T.A. et al. Reconstructing web evolution and spider diversification in the molecular era. Proc. Natl. Acad. Sci. USA 106, 52295234 (2009).
  4. Agnarsson, I., Kuntner, M. & Blackledge, T.A. Bioprospecting finds the toughest biological material: extraordinary silk from a giant riverine orb spider. PLoS One 5, e11234 (2010).
  5. Swanson, B.O., Blackledge, T.A., Beltrán, J. & Hayashi, C.Y. Variation in the material properties of spider dragline silk across species. Appl. Phys., A Mater. Sci. Process. 82, 213218 (2006).
  6. Yang, Y. et al. Toughness of spider silk at high and low temperatures. Adv. Mater. 17, 8488 (2005).
  7. Steven, E. et al. Carbon nanotubes on a spider silk scaffold. Nat. Commun. 4, 2435 (2013).
  8. Wright, S. & Goodacre, S.L. Evidence for antimicrobial activity associated with common house spider silk. BMC Res. Notes 5, 326 (2012).
  9. Vollrath, F. & Knight, D.P. Liquid crystalline spinning of spider silk. Nature 410, 541548 (2001).
  10. Rising, A. & Johansson, J. Toward spinning artificial spider silk. Nat. Chem. Biol. 11, 309315 (2015).
  11. Gosline, J.M., DeMont, M.E. & Denny, M.W. The structure and properties of spider silk. Endeavour 10, 3743 (1986).
  12. Swanson, B.O., Blackledge, T.A., Summers, A.P. & Hayashi, C.Y. Spider dragline silk: correlated and mosaic evolution in high-performance biological materials. Evolution 60, 25392551 (2006).
  13. Blasingame, E. et al. Pyriform spidroin 1, a novel member of the silk gene family that anchors dragline silk fibers in attachment discs of the black widow spider, Latrodectus hesperus. J. Biol. Chem. 284, 2909729108 (2009).
  14. Geurts, P. et al. Synthetic spider silk fibers spun from Pyriform Spidroin 2, a glue silk protein discovered in orb-weaving spider attachment discs. Biomacromolecules 11, 34953503 (2010).
  15. Ayoub, N.A., Garb, J.E., Kuelbs, A. & Hayashi, C.Y. Ancient properties of spider silks revealed by the complete gene sequence of the prey-wrapping silk protein (AcSp1). Mol. Biol. Evol. 30, 589601 (2013).
  16. Hu, X. et al. Araneoid egg case silk: a fibroin with novel ensemble repeat units from the black widow spider, Latrodectus hesperus. Biochemistry 44, 1002010027 (2005).
  17. Garb, J.E. & Hayashi, C.Y. Modular evolution of egg case silk genes across orb-weaving spider superfamilies. Proc. Natl. Acad. Sci. USA 102, 1137911384 (2005).
  18. Hayashi, C.Y. & Lewis, R.V. Molecular architecture and evolution of a modular spider silk protein gene. Science 287, 14771479 (2000).
  19. Adrianos, S.L. et al. Nephila clavipes Flagelliform silk-like GGX motifs contribute to extensibility and spacer motifs contribute to strength in synthetic spider silk fibers. Biomacromolecules 14, 17511760 (2013).
  20. Higgins, L.E., Townley, M.A. & Tillinghast, E.K. Variation in the chemical composition of orb webs built by the spider Nephila clavipes (Araneae, Tetragnathidae). J. Arachnol. 29, 8294 (2001).
  21. Choresh, O., Bayarmagnai, B. & Lewis, R.V. Spider web glue: two proteins expressed from opposite strands of the same DNA sequence. Biomacromolecules 10, 28522856 (2009).
  22. Vasanthavada, K. et al. Spider glue proteins have distinct architectures compared with traditional spidroin family members. J. Biol. Chem. 287, 3598635999 (2012).
  23. Townley, M.A. & Tillinghast, E.K. in Spider Ecophysiology (ed. Nentwif, W.) 283302 (Springer 2013).
  24. Townley, M.A., Pu, Q., Zercher, C.K., Neefus, C.D. & Tillinghast, E.K. Small organic solutes in sticky droplets from orb webs of the spider Zygiella atrica (Araneae; Araneidae): β-alaninamide is a novel and abundant component. Chem. Biodivers. 9, 21592174 (2012).
  25. Blackledge, T.A. & Hayashi, C.Y. Unraveling the mechanical properties of composite silk threads spun by cribellate orb-weaving spiders. J. Exp. Biol. 209, 31313140 (2006).
  26. Chaw, R.C. et al. Intragenic homogenization and multiple copies of prey-wrapping silk genes in Argiope garden spiders. BMC Evol. Biol. 14, 31 (2014).
  27. Sanggaard, K.W. et al. Spider genomes provide insight into composition and evolution of venom and silk. Nat. Commun. 5, 3765 (2014).
  28. Beckwitt, R. & Arcidiacono, S. Sequence conservation in the C-terminal region of spider silk proteins (Spidroin) from Nephila clavipes (Tetragnathidae) and Araneus bicentenarius (Araneidae). J. Biol. Chem. 269, 66616663 (1994).
  29. Gatesy, J., Hayashi, C., Motriuk, D., Woods, J. & Lewis, R. Extreme diversity, conservation, and convergence of spider silk fibroin sequences. Science 291, 26032605 (2001).
  30. Rising, A., Hjälm, G., Engström, W. & Johansson, J. N-terminal nonrepetitive domain common to dragline, flagelliform, and cylindriform spider silk proteins. Biomacromolecules 7, 31203124 (2006).
  31. Garb, J.E., Ayoub, N.A. & Hayashi, C.Y. Untangling spider silk evolution with spidroin terminal domains. BMC Evol. Biol. 10, 243 (2010).
  32. Blackledge, T.A., Kuntner, M. & Agnarsson, I. in Advances in Insect Physiology (ed. Casas, J.) Vol. 41, 175262 (Burlington Academic Press, 2011).
  33. Kuwana, Y., Sezutsu, H., Nakajima, K., Tamada, Y. & Kojima, K. High-toughness silk produced by a transgenic silkworm expressing spider (Araneus ventricosus) dragline silk protein. PLoS One 9, e105325 (2014).
  34. Gosline, J.M., Guerette, P.A., Ortlepp, C.S. & Savage, K.N. The mechanical design of spider silks: from fibroin sequence to mechanical function. J. Exp. Biol. 202, 32953303 (1999).
  35. Kuntner, M., Arnedo, M.A., Trontelj, P., Lokovšek, T. & Agnarsson, I. A molecular phylogeny of nephilid spiders: evolutionary history of a model lineage. Mol. Phylogenet. Evol. 69, 961979 (2013).
  36. Gaines, W.A.I. & IV & Marcotte, W.R.J. Jr. Identification and characterization of multiple Spidroin 1 genes encoding major ampullate silk proteins in Nephila clavipes. Insect Mol. Biol. 17, 465474 (2008).
  37. Bond, J.E. et al. Phylogenomics resolves a spider backbone phylogeny and rejects a prevailing paradigm for orb web evolution. Curr. Biol. 24, 17651771 (2014).
  38. Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491 (2011).
  39. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
  40. Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637644 (2008).
  41. Hoff, K.J. & Stanke, M. WebAUGUSTUS—a web service for training AUGUSTUS and predicting genes in eukaryotes. Nucleic Acids Res. 41, W123W128 (2013).
  42. Colgin, M.A. & Lewis, R.V. Spider minor ampullate silk proteins contain new repetitive sequences and highly conserved non-silk-like “spacer regions”. Protein Sci. 7, 667672 (1998).
  43. Lewis, R.V. Spider silk: the unraveling of a mystery. Acc. Chem. Res. 25, 392398 (1992).
  44. Hayashi, C.Y. & Lewis, R.V. Evidence from flagelliform silk cDNA for the structural basis of elasticity and modular nature of spider silks. J. Mol. Biol. 275, 773784 (1998).
  45. Hayashi, C.Y., Shipley, N.H. & Lewis, R.V. Hypotheses that correlate the sequence, structure, and mechanical properties of spider silk proteins. Int. J. Biol. Macromol. 24, 271275 (1999).
  46. Bailey, T.L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202W208 (2009).
  47. Vollrath, F. Spider webs and silks. Sci. Am. 266, 7076 (1992).
  48. Casem, M.L., Collin, M.A., Ayoub, N.A. & Hayashi, C.Y. Silk gene transcripts in the developing tubuliform glands of the Western black widow, Latrodectus hesperus. J. Arachnol. 38, 99103 (2010).
  49. Vollrath, F. Biology of spider silk. Int. J. Biol. Macromol. 24, 8188 (1999).
  50. Andersson, M. et al. Carbonic anhydrase generates CO2 and H+ that drive spider silk formation via opposite effects on the terminal domains. PLoS Biol. 12, e1001921 (2014).
  51. Chaw, R.C., Correa-Garhwal, S.M., Clarke, T.H., Ayoub, N.A. & Hayashi, C.Y. Proteomic evidence for components of spider silk synthesis from black widow silk glands and fibers. J. Proteome Res. 14, 42234231 (2015).
  52. Clarke, T.H. et al. Multi-tissue transcriptomics of the black widow spider reveals expansions, co-options, and functional processes of the silk gland gene toolkit. BMC Genomics 15, 365 (2014).
  53. Lane, A.K., Hayashi, C.Y., Whitworth, G.B. & Ayoub, N.A. Complex gene expression in the dragline silk producing glands of the Western black widow (Latrodectus hesperus). BMC Genomics 14, 846 (2013).
  54. Pouchkina, N.N., Stanchev, B.S. & McQueen-Mason, S.J. From EST sequence to spider silk spinning: identification and molecular characterisation of Nephila senegalensis major ampullate gland peroxidase NsPox. Insect Biochem. Mol. Biol. 33, 229238 (2003).
  55. Undheim, E.A.B. et al. A proteomics and transcriptomics investigation of the venom from the barychelid spider Trittame loki (brush-foot trapdoor). Toxins (Basel) 5, 24882503 (2013).
  56. Hagn, F. et al. A conserved spider silk domain acts as a molecular switch that controls fibre assembly. Nature 465, 239242 (2010).
  57. Bard, F. et al. Functional genomics reveals genes involved in protein secretion and Golgi organization. Nature 439, 604607 (2006).
  58. Zhao, Y., Ayoub, N.A. & Hayashi, C.Y. Chromosome mapping of dragline silk genes in the genomes of widow spiders (Araneae, Theridiidae). PLoS One 5, e12804 (2010).
  59. Hayashi, C.Y., Blackledge, T.A. & Lewis, R.V. Molecular and mechanical characterization of aciniform silk: uniformity of iterated sequence modules in a novel member of the spider silk fibroin gene family. Mol. Biol. Evol. 21, 19501959 (2004).
  60. Starrett, J., Garb, J.E., Kuelbs, A., Azubuike, U.O. & Hayashi, C.Y. Early events in the evolution of spider silk genes. PLoS One 7, e38084 (2012).
  61. Verstrepen, K.J., Jansen, A., Lewitter, F. & Fink, G.R. Intragenic tandem repeats generate functional variability. Nat. Genet. 37, 986990 (2005).
  62. Fondon, J.W. III & Garner, H.R. Molecular origins of rapid and continuous morphological evolution. Proc. Natl. Acad. Sci. USA 101, 1805818063 (2004).
  63. Suter, R.B. & Stratton, G.E. Scytodes vs. Schizocosa: predatory techniques and their morphological correlates. J. Arachnol. 33, 715 (2005).
  64. Suter, R.B. & Stratton, G.E. Spitting performance parameters and their biomechanical implications in the spitting spider, Scytodes thoracica. J. Insect Sci. 9, 115 (2009).
  65. Clements, R. & Li, D.Q. Regulation and non-toxicity of the spit from the pale spitting spider Scytodes pallida (Araneae: Scytodidae). Ethology 111, 311321 (2005).
  66. Zobel-Thropp, P.A., Correa, S.M., Garb, J.E. & Binford, G.J. Spit and venom from Scytodes spiders: a diverse and distinct cocktail. J. Proteome Res. 13, 817835 (2014).
  67. Teulé, F. et al. Silkworms transformed with chimeric silkworm/spider silk genes spin composite silk fibers with improved mechanical properties. Proc. Natl. Acad. Sci. USA 109, 923928 (2012).
  68. Bolger, A.M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 21142120 (2014).
  69. Xu, H. et al. FastUniq: a fast de novo duplicates removal tool for paired short reads. PLoS One 7, e52249 (2012).
  70. Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 15131518 (2011).
  71. Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012).
  72. Wences, A.H. & Schatz, M.C. Metassembler: merging and optimizing de novo genome assemblies. Genome Biol. 16, 207 (2015).
  73. Bradnam, K.R. et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2, 10 (2013).
  74. Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V. & Zdobnov, E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 32103212 (2015).
  75. Grabherr, M.G. et al. Full-length transcriptome assembly from RNA–Seq data without a reference genome. Nat. Biotechnol. 29, 644652 (2011).
  76. Haas, B.J. et al. De novo transcript sequence reconstruction from RNA–seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 14941512 (2013).
  77. Wu, T.D. & Watanabe, C.K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 18591875 (2005).
  78. Dobin, A. et al. STAR: ultrafast universal RNA–seq aligner. Bioinformatics 29, 1521 (2013).
  79. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462467 (2005).
  80. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573580 (1999).
  81. Slater, G.S.C. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31 (2005).
  82. Lowe, T.M. & Eddy, S.R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955964 (1997).
  83. UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204D212 (2015).
  84. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403410 (1990).
  85. Mitchell, A. et al. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. 43, D213D221 (2015).
  86. Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 16471649 (2012).
  87. Rogers, M.B. et al. Intrahost dynamics of antiviral resistance in influenza A virus reflect complex patterns of segment linkage, reassortment, and natural selection. MBio 6, e0246414 (2015).
  88. English, A.C. et al. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One 7, e47768 (2012).
  89. Chaisson, M.J. & Tesler, G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics 13, 238 (2012).
  90. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv https://arxiv.org/abs/1303.3997 (2013).
  91. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 20782079 (2009).
  92. Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 29872993 (2011).
  93. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 21562158 (2011).
  94. Watterson, G.A. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7, 256276 (1975).
  95. Scharlaken, B. et al. Reference gene selection for insect expression studies using quantitative real-time PCR: the head of the honeybee, Apis mellifera, after a bacterial challenge. J. Insect Sci. 8, 110 (2008).
  96. Livak, K.J. & Schmittgen, T.D. Analysis of relative gene expression data using real-time quantitative PCR and the 2−ΔΔCT method. Methods 25, 402408 (2001).
  97. Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 16391645 (2009).

Download references

Author information

Affiliations

  1. Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA.

    • Paul L Babb,
    • Nicholas F Lahens,
    • Eun Ji Kim &
    • Benjamin F Voight
  2. Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA.

    • Paul L Babb &
    • Benjamin F Voight
  3. Institute for Translational Medicine and Therapeutics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA.

    • Nicholas F Lahens,
    • Eun Ji Kim &
    • Benjamin F Voight
  4. Department of Biology, University of California, Riverside, Riverside, California, USA.

    • Sandra M Correa-Garhwal &
    • Cheryl Y Hayashi
  5. Genomics and Computational Biology Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

    • David N Nicholson
  6. Divisions of Perinatal Biology and Immunobiology, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.

    • John B Hogenesch
  7. Biological Institute, Research Centre of the Slovenian Academy of Sciences and Arts, Ljubljana, Slovenia.

    • Matjaž Kuntner
  8. Department of Biology, University of Vermont, Burlington, Vermont, USA.

    • Linden Higgins &
    • Ingi Agnarsson

Contributions

B.F.V., L.H., and I.A. conceived of the project. P.L.B., B.F.V., N.F.L., and J.B.H. designed the experiments. B.F.V. and J.B.H. contributed reagents and materials. P.L.B., L.H., S.M.C.-G., and C.Y.H. provided samples. C.Y.H., S.M.C.-G., and L.H. performed specimen dissections. P.L.B. conducted all bench experiments. P.L.B. performed all assembly and annotation pipelines. P.L.B., B.F.V., and D.N.N. ran motif searches. P.L.B., N.F.L., and E.J.K. performed expression analyses. P.L.B., B.F.V., C.Y.H., J.B.H., M.K., L.H., and I.A. reviewed analyses. P.L.B. and B.F.V. prepared figures and tables. P.L.B. and B.F.V. wrote the first draft of the manuscript. All authors reviewed drafts of the paper.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: The golden orb-weaver spider’s morphology, reported silk gland anatomy, and web construction. (641 KB)

    (a) Photographs of N. clavipes showing an adult female at the center hub of her orb web (left) and a view of the spinneret silk-extruding organs on the underside of the female abdomen (right). (b) Silk gland anatomy of N. clavipes, showing the seven different female araneoid gland morphologies found in the abdomen and the different classes of silk proteins produced. Each silk class has specific physical characteristics; for example, the minor and major ampullate spidroins produce silks with great tensile strength, flagelliform silk has great extensibility, aggregate silks are non-fibrous stick glue, etc. This illustration (inspired by ref. 52) exhibits one set of silk glands and spinnerets from a bilateral pair, and indicates that each gland type produces a specific type of silk. However our expression data (Fig. 5a) suggest that this is not the case, supporting previous findings48, 50, 53 that individual glands can express multiple classes of spidroins. Note: the gland type coloration scheme and corresponding silk use pictograms defined here are used in later figures. (c) Putative applications of spider silk types in web construction (web diagram adapted from ref. 54), as described in previous studies. (i) Web building and maintenance: major ampullate silk is used for bridgelines and web radii; minor ampullate silk is used for temporary spiral; piriform attaches fibers together and to substrates; flagelliform is used for the capture spiral; aggregate silks are sticky, aiding in adherence and prey capture. (ii) Prey wrapping: aciniform (top inset photo). (iii) Silk egg casings: tubuliform (bottom inset photo). References for silk classes and their purported uses are listed in the main text. (Photos provided by P.L.B.)

  2. Supplementary Figure 2: Maximum-likelihood phylogenetic gene tree of 28 N. clavipes spidroins in the context of 55 spidroins from other spider taxa. (107 KB)

    The spidroin gene tree is rooted with a Bothriocyrtum californicum fibroin sequence (B.c. fibroin1; accession HM752562) and is based on multiple-sequence alignment (MSA) using the first ~130 amino acid residues of the N-terminal domain encoded by each gene. MSA was performed with Geneious, Clustal, and BLOSUM62, and the consensus tree was built with PhyML (Supplementary Note). Bootstrap proportions >50 (based on 1,000 replicates) are shown to the left of their respective nodes. Colors follow the gland/spidroin class designations shown in Supplementary Figure 1. Codes and accession numbers for different spidroins and taxa are listed in Supplementary Table 10.

  3. Supplementary Figure 3: Maximum-likelihood phylogenetic gene trees for the catalog of 28 spidroins identified in N. clavipes. (165 KB)

    (a,b) Unrooted maximum-likelihood phylogenic trees for the catalog of 28 spidroins identified in N. clavipes, shown as both transformed (a) and non-transformed (b) layouts. Both trees are based on a multiple-sequence alignment (MSA) using the first ~130 amino acid residues of the N-terminal domain for each N. clavipes spidroin. MSA was performed with Geneious, Clustal, and BLOSUM62, and the consensus tree was built with PhyML (Supplementary Note). Bootstrap proportions >50 (based on 1,000 replicates) are shown to the left of their respective nodes. Colors follow the gland/spidroin class designations shown in Supplementary Figure 1. Codes for N. clavipes spidroins are listed in Supplementary Table 12.

  4. Supplementary Figure 4: Agarose gel images of long-range PCR–amplified MiSp sequences used for validation of draft assembly, scaffold bridging, and gap closure. (165 KB)

    The top panel highlights a single lane with an LR-PCR reaction (golden rectangle) for MiSp-c. The bottom panel highlights four lanes with LR-PCR reactions (golden rectangle) for MiSp-d. In both cases, multiple large bands are visible, indicating amplification of multiple targets that presumably represent genomic regions with high sequence similarity to the binding sites of the oligonucleotide primers used to isolate both MiSp types.

  5. Supplementary Figure 5: Distribution of amino acid frequency for N. clavipes ‘gold’ gene models. (138 KB)

    Amino acid frequency distributions were calculated for all 20 amino acids for all mRNA transcripts from the gold gene model set (n = 17,989 mRNA sequences). Several spidroins were found at the extreme ends of the individual amino acid distributions (Supplementary Fig. 5 and Supplementary Note). Box-and-whisker plots show the range of frequency values (y axis) for the given amino acid residues (x axis). Thick black center lines represent median values. Upper whiskers represent the largest observation ≤ the upper quartile (Q3) + 1.5 interquartile range (IQR), and lower whiskers represent the smallest observation ≥ the lower quartile (Q1) – 1.5 IQR.

  6. Supplementary Figure 6: Distribution of amino acid frequency for 28 N. clavipes spidroin genes. (100 KB)

    Amino acid frequency distributions were calculated for all 20 amino acids for all N. clavipes spidroin genes (n = 28 sequences). Box-and-whisker plots show the range of frequency values (y axis) for the given amino acid residues (x axis). Thick black center lines represent median values. Upper whiskers represent the largest observation ≤ the upper quartile (Q3) + 1.5 interquartile range (IQR), and the lower whiskers represent the smallest observation ≥ the lower quartile (Q1) – 1.5 IQR. Overall, spidroins exhibit enrichment of alanine, glycine, and serine residues, which have significantly different proportions when compared to 17,989 mRNA sequences from the gold gene set (Wilcoxon rank-sum test; Supplementary Fig. 5 and Supplementary Note). **P < 0.01.

  7. Supplementary Figure 7: Shared and private motif occurrences in N. clavipes spidroins. (137 KB)

    Bar graph comparing the number of shared (gold) versus private (dark gray) distinct repetitive motif occurrences observed in the different N. clavipes spidroins (n = 28 sequences).

  8. Supplementary Figure 8: Shared and private cassette occurrences in N. clavipes spidroins. (122 KB)

    Bar graph comparing the number of shared (gold) and private (dark gray) distinct repetitive cassette occurrences observed in the different N. clavipes spidroins (n = 28 sequences).

  9. Supplementary Figure 9: RNA–seq expression patterns of spidroin genes in 13 N. clavipes tissue samples. (267 KB)

    Heat map showing the absolute number of normalized RNA–seq reads that map to spidroin transcripts, assayed in ten individual silk glands, one venom gland isolate, and two brain isolates collected from two non-gravid females, Nep-008 and Nep-009. Owing to extensive sequence similarity between MaSp-b and MaSp-c, it was not possible to distinguish between reads that mapped to these two spidroins; thus, data for these two transcripts are presented together as “MaSp-b,c”. Reads mapping to MaSp-h and AgSp-c exceeded the heat map’s informative range; thus, we have included bar graph insets (right) confirming that reads mapping to MaSp-h (top inset) and AgSp-c (bottom inset) are substantially more abundant in silk glands than in venom gland or brain.

  10. Supplementary Figure 10: Distributions of relative expression values for 29 N. clavipes genes in seven tissue types. (322 KB)

    Box-and-whisker plots of the relative expression for all 28 N. clavipes spidroin genes and 1 venom gene (PR-1) in tissue dissections (n = 3 independent-specimen biological replicates per tissue) assayed by qPCR. Tissues included venom glands, five anatomically distinct silk glands, and ‘other’ silk glands (aciniform and piriform glands attached to the spinneret) and are shown left of the y axis, whereas relative expression (2−ΔΔCT method46) is depicted on the y axis (log10 scale) organized in rows by tissue type. Box-and-whisker plots show the range of expression values for the given genes (x axis) relative to RPL13a (housekeeping gene) expression and normalized to leg tissue. Thick black center lines represent median values. Upper whiskers represent the largest observation ≤ the upper quartile (Q3) + 1.5 interquartile range (IQR), and the lower whiskers represent the smallest observation ≥ the lower quartile (Q1) – 1.5 IQR. Asterisks indicate a single silk gland type exhibiting significantly greater expression values for a given gene versus all other silk gland types together (Wilcoxon rank-sum test). **P < 0.01.

  11. Supplementary Figure 11: Mean relative expression values of 29 N. clavipes genes in seven tissue types. (198 KB)

    Heat map showing the relative expression of N. clavipes spidroin loci in tissue dissections (n = 3 biological replicates per tissue) assayed by qPCR. Tissues included venom glands, five anatomically distinct silk glands, and ‘other’ silk glands (aciniform and piriform glands attached to the spinneret) and are shown on the x axis, with spidroins arranged on the y axis. The heat map panels depict relative mean fold change in gene expression (2−ΔΔCT method46) per tissue (distinct tissue dissections from n = 3 individuals) over RPL13a and normalized to leg tissue.

  12. Supplementary Figure 12: RNA–seq expression patterns of SSTs in 13 N. clavipes tissue samples. (415 KB)

    Heat map showing the absolute number of normalized reads that map to 649 non-spidroin silk gland–specific transcripts (SSTs), assayed in ten individual silk glands, one venom gland, and two brain isolates collected from two non-gravid females, Nep-008 and Nep-009. SSTs are vertically clustered based on the filtering method used for discovery (Supplementary Note), as noted by colored vertical bars at the right of the heat map. The categories defined on the left are described in Supplementary Table 15.

  13. Supplementary Figure 13: Polymorphism levels of genes and genic features in the N. clavipes genome. (170 KB)

    (a) Box-and-whisker plot comparing the distribution of θW values43 derived (from SNP counts) for 14,025 gold gene sequences in comparison to the distribution of θW values for 28 N. clavipes spidroins. Box-and-whisker plots show the range of θW values for each gene set. Thick black center lines represent median values. Upper whiskers represent the largest observation ≤ the upper quartile (Q3) + 1.5 interquartile range (IQR), and the lower whiskers represent the smallest observation ≥ the lower quartile (Q1) – 1.5 IQR. Asterisks indicate the 28 N. clavipes spidroin genes that exhibit significantly greater θW values than the collected gold gene set (Wilcoxon rank-sum test; Supplementary Note). **P < 0.01. (b) Vertical bar graph showing the mean θW values for 11 genomic feature categories, including many gold gene model subfeatures, in comparison to the mean θW values for N. clavipes spidroins, silk N termini, and silk C termini. (c) Bar graph depicting the θW values for individual N. clavipes spidroins.

PDF files

  1. Supplementary Text and Figures (4,928 KB)

    Supplementary Figures 1–13, Supplementary Tables 1–12 and Supplementary Note

Excel files

  1. Supplementary Tables 13–15 (126 KB)

    Supplementary Tables 13–15

  2. Supplementary Data (172 KB)

    N. clavipes qPCR 2–ΔΔCT values for all loci and replicates. Includes all results of statistical tests.

Additional data