The oyster genome reveals stress adaptation and complexity of shell formation

Journal name:
Nature
Volume:
490,
Pages:
49–54
Date published:
DOI:
doi:10.1038/nature11413
Received
Accepted
Published online
Corrected online

Abstract

The Pacific oyster Crassostrea gigas belongs to one of the most species-rich but genomically poorly explored phyla, the Mollusca. Here we report the sequencing and assembly of the oyster genome using short reads and a fosmid-pooling strategy, along with transcriptomes of development and stress response and the proteome of the shell. The oyster genome is highly polymorphic and rich in repetitive sequences, with some transposable elements still actively shaping variation. Transcriptome studies reveal an extensive set of genes responding to environmental stress. The expansion of genes coding for heat shock protein 70 and inhibitors of apoptosis is probably central to the oyster’s adaptation to sessile life in the highly stressful intertidal zone. Our analyses also show that shell formation in molluscs is more complex than currently understood and involves extensive participation of cells and their exosomes. The oyster genome sequence fills a void in our understanding of the Lophotrochozoa.

At a glance

Figures

  1. Fosmid-pooling strategy for oyster genome assembly.
    Figure 1: Fosmid-pooling strategy for oyster genome assembly.

    Genomic DNA was randomly sheared into fragments. a, b, A 40-kb-insert fosmid library was constructed (a), and 145,170 fosmid clones were randomly selected and assigned into 1,613 pools, each containing 90 clones covering 0.57% of the diploid genome (b). For each pool, three Illumina short-insert barcoded libraries (two 200bp and one 500bp) were constructed and ~60-fold coverage of 90-bp reads (20-fold per library) were generated, and assembled using SOAPdenovo with optimizing parameters. Assemblies from each pool were further corrected and reassembled if unexpected connections were detected owing to high similarity sequences from different fosmids, and gaps were filled by local assembly. c, Fosmid scaffolds were split into contigs at unfilled regions, leaving no undetermined bases in the sequences. Each base was assigned a Phred-like quality score determined by its coverage and alignment mismatches, and these sequences were merged into supercontigs using the overlap layout consensus method. Redundancy was removed using self-to-self alignment and sequencing depth information. d, Whole-genome shotgun Illumina libraries (200-bp to 20-kb inserts) from sheared genomic DNA were constructed for mated-pair Illumina sequencing. e, The fosmid supercontigs were linked into scaffolds using (1) the whole-genome shotgun sequences; (2) inferred paired-end information extracted from assembled pool scaffolds with a span size ranging from 50bp to 37.5kb; and (3) 225,000 fosmid ends sequenced using Sanger technology.

  2. Clustering of Hox genes in Pacific oyster Crassostrea gigas, polychaete annelid Capitella teleta, fruitfly Drosophila melanogaster, lancelet Branchiostoma floridae and Homo sapiens.
    Figure 2: Clustering of Hox genes in Pacific oyster Crassostrea gigas, polychaete annelid Capitella teleta, fruitfly Drosophila melanogaster, lancelet Branchiostoma floridae and Homo sapiens.

    Oblique lines indicate regions of Hox cluster that are non-contiguous or interrupted. Blue denotes anterior Hox genes, yellow denotes paralogy group 3 Hox genes, green and purple denote central Hox genes and red denotes posterior Hox genes.

  3. Expansion, expression and pathway distribution of defence-related genes in Crassostrea gigas.
    Figure 3: Expansion, expression and pathway distribution of defence-related genes in Crassostrea gigas.

    a, Expansion and expression of key genes in major stress-response pathways in C. gigas. Genes include HSPs and HSF in the heat-shock response; GRP78, CRT, CNX, GRP94, PERK, IRE1 and EIF2a in the endoplasmic reticulum unfolded-protein response (UPRER); IAPs, BCL2 like, BAG, BI1, caspases, FADD and TNFR in apoptotic pathways; CYP450 and MO in oxidation; and SOD, GPX, PRX and CAT in anti-oxidation. Boxes with bold black borders indicate gene families (HSPs, IAPs and SODs) expanded in C. gigas, and the filled colours correspond to their degree of upregulation in RPKMtreatment/RPKMcontrol by stress, found in 61 transcriptomes from oysters challenged with 9 types of stressors (Supplementary Text G2 and Supplementary Table 23). b, Venn diagram of common and unique genes expressed in response to temperature, salinity, air exposure and heavy-metal stress (zinc, cadmium, copper, lead and mercury), showing overlap of responses. c, Number of genes with and without detectable paralogues differentially expressed under stress and normal conditions, showing that genes responding to stress are more likely to have paralogues (P<1×10−10; χ2 test). Green sections of the pie chart represent 1,442, 809, 358, 550 and 7,938 paralogues for air exposure, metal, temperature, salinity and normal conditions, respectively.

  4. Genes related to shell formation identified from mass spectroscopy analysis of shell proteins and transcriptome data.
    Figure 4: Genes related to shell formation identified from mass spectroscopy analysis of shell proteins and transcriptome data.

    a, Relative expression (y axis) of genes coding for chitin synthase (gene CGI_10009438) and fibronectin-like (CGI_10016964) in early development corresponds to the formation of shell gland and first larval shells, as seen in scanning electron microscope photos. White arrow denotes the invagination that forms the shell gland. Developmental stages (x axis) and their timeline are defined in Supplementary Table 12. b, In adults, chitin synthase and fibronectin-like proteins (same colour as in a) are almost exclusively expressed in the mantle compared with other organs. Fibronectin-like is also one of the most abundant proteins found in the shell. c, Distribution of shell proteins in diverse Kyoto encyclopedia of genes and genomes (KEGG) pathways indicative of general cellular functions. d, Expression of 26 tyrosinase genes in the mantle edge, mantle pallial and other organs. Tyrosinases are abundant in shells and their higher expression in the non-pigmented mantle pallial indicate that their functions are not limited to melanogenesis but are related to shell formation.

Introduction

Oceans cover approximately 71% of the Earth’s surface and harbour most of the phylum diversity of the animal kingdom. Understanding marine biodiversity and its evolution remains a major challenge. The Pacific oyster C. gigas (Thunberg, 1793) is a marine bivalve belonging to the phylum Mollusca, which contains the largest number of described marine animal species1. Molluscs have vital roles in the functioning of marine, freshwater and terrestrial ecosystems, and have had major effects on humans, primarily as food sources but also as sources of dyes, decorative pearls and shells, vectors of parasites, and biofouling or destructive agents. Many molluscs are important fishery and aquaculture species, as well as models for studying neurobiology, biomineralization, ocean acidification and adaptation to coastal environments under climate change2, 3. As the most speciose member of the Lophotrochozoa, phylum Mollusca is central to our understanding of the biology and evolution of this superphylum of protostomes.

As sessile marine animals living in estuarine and intertidal regions, oysters must cope with harsh and dynamically changing environments. Abiotic factors such as temperature and salinity fluctuate wildly, and toxic metals and desiccation also pose serious challenges. Filter-feeding oysters face tremendous exposure to microbial pathogens. Oysters do have a notable physical line of defence against predation and desiccation in the formation of thick calcified shells, a key evolutionary innovation making molluscs a successful group. However, acidification of the world’s oceans by uptake of anthropogenic carbon dioxide poses a potentially serious threat to this ancient adaptation4. Understanding biomineralization and molluscan shell formation is, thus, a major area of interest5. Crassostrea gigas is also an interesting model for developmental biology owing to its mosaic development with typical molluscan stages, including trochophore and veliger larvae and metamorphosis.

A complete genome sequence of C. gigas would enable a more thorough understanding of oyster biology and the evolution of Lophotrochozoa. One of the main challenges, however, is the high levels of polymorphism present in oysters and many marine invertebrates6, 7, 8. To overcome this, an oyster derived from four generations of full-sibling mating (coefficient of inbreeding, F = 0.59) was used for genome sequencing and assembly (Supplementary Text B1) through fosmid pooling, next-generation sequencing (NGS) and hierarchical assembling. Combining these genomic data with transcriptomes from different organs, different developmental stages and adults challenged with stressors, in addition to mass spectrometric analysis of shell proteins, allowed us to explore characteristics of the oyster genome and key aspects of molluscan biology related to stress response and shell formation.

Sequencing and hierarchical assembly

NGS technology has been successfully applied for de novo genome sequencing and assembly using whole-genome shotgun strategies9, 10, 11, 12, 13. We initially generated 155-fold Illumina whole-genome shotgun reads (Supplementary Table 1), but could not adequately assemble them owing to high levels of polymorphism and abundant repetitive sequences (Supplementary Text B2 and Supplementary Fig. 1). As possible alternative sequencing strategies—such as the addition of longer Roche 454 reads12, 13 or traditional bacterial artificial chromosome (BAC)-to-BAC sequencing—are expensive, we opted instead for a more cost-effective fosmid-pooling strategy. In brief, a fosmid library was constructed, and 145,170 clones (~tenfold genome coverage) were evenly and randomly assigned into 1,613 pools, each of which was sequenced to ~60-fold depth and assembled separately (Fig. 1 and Supplementary Table 1). Contigs from each pool were merged into supercontigs, totalling 1,002 megabases (Mb) (Supplementary Text B4.1–3), which was larger than genome-size estimates of 637Mb from flow cytometry or 545Mb from k-mer (k-base fragment) analysis (Supplementary Text B1, 2.3), owing to failure of some allelic variants to merge (Supplementary Figs 3 and 4). Self-to-self whole-genome alignment with LASTZ14 and sequencing depth information were used to remove redundancy in the assembly (Supplementary Text B4.4). The resulting 446Mb of the assembly were retained for further scaffolding using paired-end data (Fig. 1). The final assembly comprised 559Mb, with a contig N50 size (at which 50% of assembly was covered) of 19.4 kilobases (kb) and a scaffold N50 size of 401kb (Supplementary Text B4.5 and Supplementary Table 3). Over 90% of the assembly was covered by the longest 1,670 (14%) scaffolds.

Figure 1: Fosmid-pooling strategy for oyster genome assembly.
Fosmid-pooling strategy for oyster genome assembly.

Genomic DNA was randomly sheared into fragments. a, b, A 40-kb-insert fosmid library was constructed (a), and 145,170 fosmid clones were randomly selected and assigned into 1,613 pools, each containing 90 clones covering 0.57% of the diploid genome (b). For each pool, three Illumina short-insert barcoded libraries (two 200bp and one 500bp) were constructed and ~60-fold coverage of 90-bp reads (20-fold per library) were generated, and assembled using SOAPdenovo with optimizing parameters. Assemblies from each pool were further corrected and reassembled if unexpected connections were detected owing to high similarity sequences from different fosmids, and gaps were filled by local assembly. c, Fosmid scaffolds were split into contigs at unfilled regions, leaving no undetermined bases in the sequences. Each base was assigned a Phred-like quality score determined by its coverage and alignment mismatches, and these sequences were merged into supercontigs using the overlap layout consensus method. Redundancy was removed using self-to-self alignment and sequencing depth information. d, Whole-genome shotgun Illumina libraries (200-bp to 20-kb inserts) from sheared genomic DNA were constructed for mated-pair Illumina sequencing. e, The fosmid supercontigs were linked into scaffolds using (1) the whole-genome shotgun sequences; (2) inferred paired-end information extracted from assembled pool scaffolds with a span size ranging from 50bp to 37.5kb; and (3) 225,000 fosmid ends sequenced using Sanger technology.

To assess the completeness of the assembly, 105-fold coverage of short-insert library reads (<2kb) that participated in assembly (Supplementary Table 1) were aligned against the assembly. Over 99% of these reads were successfully mapped, using a combination of Burrows–Wheeler Aligners15 and the more sensitive LASTZ (Supplementary Fig. 5 and Supplementary Table 4). The integrity of the assembly is further demonstrated by the successful mapping of 99% of the BAC sequenced obtained using the Sanger sequencing technique, and 98% of ~68,000 expressed sequence tags from 454 sequencing (Supplementary Text B5, Supplementary Fig. 6 and Supplementary Tables 5 and 6). Fosmid pooling has been used for re-sequencing16, 17, and our results show that the combination of fosmid pooling, NGS and hierarchical assembly provides a new, cost-effective alternative for de novo sequencing and assembly of complex genomes.

Polymorphism and repetitive sequences

To understand polymorphism in the oyster genome, we analysed allelic variation in the assembled genome (inbred) and one re-sequenced wild oyster (wild) (Supplementary Text C1). The inbred genome contained 3.1million single-nucleotide polymorphisms and 258,405 short insertion/deletion (indels, 1–40base pairs (bp)) yielding a sequence polymorphism rate of 0.73%, whereas the wild genome had 3.8million single-nucleotide polymorphisms and 238,182 indels, or a polymorphism rate of 1.3% (Supplementary Table 7), comparable to previous estimates18. This 44% reduction in polymorphism in the inbred genome is smaller than the 59.4% predicted from four generations of brother–sister mating, indicating that selection favouring heterozygotes had occurred19. The polymorphism combining inbred and wild (among four haplotypes) was 2.3%, higher than that in most studied animal genomes20, 21 but comparable to that in known high-polymorphism species7. In inbred and wild, we found 3,094 short indels located in coding regions inferred to cause frameshift variants in 2,665 genes, providing an important source for recessive lethal mutations.

k-mer-based analysis of the oyster genome showed that ~35% of 17-mers had at least two identical copies in the genome, suggesting an abundance of repetitive sequences (Supplementary Fig. 1). Similarly, homology searching and ab initio prediction found 202Mb (36% of the genome) in repetitive sequences (Supplementary Text C2 and Supplementary Table 8). Over 62% of the detected repeats could not be assigned to known categories, reflecting the paucity of genomic information from molluscs22. Large numbers of transposase (359) and reverse-transcriptase (779) gene fragments were detected; over 96% of these had detectable transcripts (Supplementary Fig. 8). Alignment of the wild sequence against the assembly identified 20,605 deletions (>100bp), over 80% of which overlapped with detected transposable elements, suggesting that transposable elements may have an active role in shaping genome variation. Using MITE-hunter23, we detected 157,007 copies of miniature inverted-repeat transposable elements (MITEs), accounting for a remarkable 8.82% of the genome (Supplementary Text C2.3 and Supplementary Table 9). Pair-wise comparisons show extremely low sequence divergence in some MITE families (Supplementary Fig. 9), indicating that they may still be active.

Gene annotation and developmental genomics

A total of 28,027 genes were predicted encoding 50 amino acids or more by combining de novo prediction and evidence-based searches using reference genomes, oyster expressed sequence tags and transcriptomes from multiple organs and developmental stages (Supplementary Text D1 and E1 and Supplementary Fig. 11), with 96.1% showing expression (reads per kb per million mapped reads (RPKM) > 1 in at least one transcriptome; Supplementary Text D2). Of the inferred proteins, 21,085 matched entries in the SWISS-PROT, InterPro or TrEMBL databases. These genes plus their transcriptome profile from 7 adult organs and at 38 developmental stages provide valuable resources for comparative genomics analysis (Supplementary Text E2 and 3), functional inference and studies on development and organogenesis (Supplementary Text F2 and Supplementary Fig. 15).

One notable finding of developmental interest is that the oyster Hox gene cluster is broken into four sections (Fig. 2) with flanking non-Hox genes (Supplementary Fig. 16). We did not find a clear Antennapedia gene, which is present in other bivalves such as Pecten and Yoldia24 (Supplementary Fig. 17). Disruption of the Hox cluster, as also observed in tunicates, nematodes and drosophilids, has been attributed to the loss of temporal co-linearity and modified developmental control25. Supporting this model, we find that Hox genes in the oyster are not activated in an order matching their identity or genomic position, with, for example, HOX4 and HOX1 peaking before gastrulation, LOX5 and POST2 during the trochophore stage and HOX5 during the pediveliger stage (Supplementary Fig. 18 and Supplementary Table 15).

Figure 2: Clustering of Hox genes in Pacific oyster Crassostrea gigas, polychaete annelid Capitella teleta, fruitfly Drosophila melanogaster, lancelet Branchiostoma floridae and Homo sapiens.
Clustering of Hox genes in Pacific oyster Crassostrea gigas, polychaete annelid Capitella teleta, fruitfly Drosophila melanogaster, lancelet Branchiostoma floridae and Homo sapiens.

Oblique lines indicate regions of Hox cluster that are non-contiguous or interrupted. Blue denotes anterior Hox genes, yellow denotes paralogy group 3 Hox genes, green and purple denote central Hox genes and red denotes posterior Hox genes.

Adaptation to environmental stress

Comparison with seven other sequenced genomes identified 8,654 oyster-specific genes (Supplementary Text E3.1) that are probably important in the evolution and adaptation of oysters and other molluscs. With oysters being the only representative, these genes could be shared by other molluscs. Among these genes, gene ontology terms related to ‘protein binding’, ‘apoptosis’, ‘cytokine activity’ and ‘inflammatory response’ are highly enriched (P<0.0001; Supplementary Text E2 and Supplementary Table 17), indicating over-representation of some host-defence genes against biotic and abiotic stress. Manual examination shows that several gene families related to defence pathways, including protein folding, oxidation and anti-oxidation, apoptosis and immune responses, are expanded in C. gigas (Fig. 3a and Supplementary Table 18). The oyster genome contains 88 heat shock protein 70 (HSP70) genes, which have crucial roles in protecting cells against heat and other stresses, compared with ~17 in humans and 39 in sea urchins. Phylogenetic analysis finds clustering of 71 oyster HSP70 genes to themselves, suggesting that the expansion is specific to the oyster (Supplementary Fig. 19). Also expanded are cytochrome P450 (Supplementary Fig. 20) and multi-copper oxidase gene families, which are important in the biotransformation of endobiotic and xenobiotic chemicals26, and extracellular superoxide dismutases, which are important in defence against oxidative stress. The oyster genome has 48 genes coding for inhibitor of apoptosis proteins (IAPs), compared with 8 in humans and 7 in sea urchins, indicating a powerful anti-apoptosis system in oysters. Genes encoding lectin-like proteins, including C-type lectin, fibrinogen-related proteins and C1q domain-containing proteins (C1QDCs), are highly over-represented in the oyster genome (P<0.0001; Supplementary Table 18); these genes have important roles in the innate immune response in invertebrates27, 28, 29. Interestingly, many immune-related genes, including genes coding for Gram-negative bacteria-binding proteins, peptidoglycan-recognition proteins, defensin, C-type-lectin-domain-containing proteins and C1QDCs, are highly expressed in the digestive gland (Supplementary Fig. 21), indicating that the digestive system of this filter feeder is an important first-line defence organ against pathogens.

Figure 3: Expansion, expression and pathway distribution of defence-related genes in Crassostrea gigas.
Expansion, expression and pathway distribution of defence-related genes in Crassostrea gigas.

a, Expansion and expression of key genes in major stress-response pathways in C. gigas. Genes include HSPs and HSF in the heat-shock response; GRP78, CRT, CNX, GRP94, PERK, IRE1 and EIF2a in the endoplasmic reticulum unfolded-protein response (UPRER); IAPs, BCL2 like, BAG, BI1, caspases, FADD and TNFR in apoptotic pathways; CYP450 and MO in oxidation; and SOD, GPX, PRX and CAT in anti-oxidation. Boxes with bold black borders indicate gene families (HSPs, IAPs and SODs) expanded in C. gigas, and the filled colours correspond to their degree of upregulation in RPKMtreatment/RPKMcontrol by stress, found in 61 transcriptomes from oysters challenged with 9 types of stressors (Supplementary Text G2 and Supplementary Table 23). b, Venn diagram of common and unique genes expressed in response to temperature, salinity, air exposure and heavy-metal stress (zinc, cadmium, copper, lead and mercury), showing overlap of responses. c, Number of genes with and without detectable paralogues differentially expressed under stress and normal conditions, showing that genes responding to stress are more likely to have paralogues (P<1×10−10; χ2 test). Green sections of the pie chart represent 1,442, 809, 358, 550 and 7,938 paralogues for air exposure, metal, temperature, salinity and normal conditions, respectively.

To investigate genome-wide responses to stress, we sequenced 61 transcriptomes from C. gigas subjected to nine stressors, including temperature, salinity, air exposure and heavy metals (Supplementary Text G1 and Supplementary Tables 19 and 20). We found that 5,844 genes were differentially expressed under at least one stressor, and genes responding to different stressors showed significant overlap (Fig. 3b and Supplementary Fig. 23a). Air exposure induced a response from the largest number of genes (4,420), indicating that air exposure is a major stressor and that oysters have evolved an extensive gene set in defence. Genes differentially expressed in response to stress are more likely to have paralogues (Fig. 3c), suggesting that expansion and selective retention of duplicated defence-related genes are probably important to oyster adaptation. Under most stressors, genes coding for HSPs, histones, IAPs and protein biogenesis were upregulated, and those for protein degradation downregulated, pointing to concerted responses to maintain cellular homeostasis30 (Supplementary Text G3 and Supplementary Table 21). Genes involved in the unfolded protein response to cellular stress in the endoplasmic reticulum (coding for calreticulin, calnexin, 78- and 94-kDa glucose-regulated proteins) were upregulated, indicating that protein quality control is critical in cellular homeostasis under stress.

Air exposure induced up to 67-fold upregulation of five highly expressed IAPs (Supplementary Fig. 24a). Other inhibitors of apoptosis were also upregulated: BCL2 up to fourfold and BAG up to 12-fold (Supplementary Fig. 24b). These apoptosis inhibitors were also highly upregulated under heat and low salinity stress. These findings, along with the expansion of IAPs, suggest that a powerful anti-apoptosis system exists and may be critical for the amazing endurance of oysters to air exposure and other stresses. The existence of an intrinsic apoptosis pathway in invertebrates has been controversial, and parts of the pathways have only recently been demonstrated for two lophotrochozoans31, 32. The finding of key genes belonging to both intrinsic (BAX, BAK, BAG, BCL2, BI1 and procaspase) and extrinsic (TNFR and caspase 8) apoptosis pathways indicates that oysters have advanced apoptosis systems. Powerful inhibition of apoptosis as shown by genomic and transcriptomic analyses may be central to the ability of oysters to tolerate prolonged air exposure and other stresses.

Heat stress induced a ~2,000-fold increase in expression of five highly inducible HSP70 genes or a 13.9-fold increase in average expression of all HSP70 genes, amounting to 4.2% of all transcripts (Supplementary Figs 24c and 25). The genomic expansion and massive upregulation of HSP genes help to explain why C. gigas can tolerate temperatures as high as 49°C when exposed to summer sun at low tide33. HSP genes were also upregulated under other stressors and may be central to the oyster defence against all stresses (Supplementary Fig. 25). HSP genes may also inhibit apoptosis by binding to effector caspases34.

Genes involved in signal transduction, including genes coding for G-protein-coupled receptors and Ras GTPase, were also activated by stressors (Supplementary Fig. 24f) and over-represented in the oyster genome (Supplementary Table 11). These regulators may have a role in orchestrating stress responses, which seem to be well coordinated (Fig. 3a and Supplementary Fig. 25). The expansion of key defence genes and the strong, complex transcriptomic response to stress highlight the sophisticated genomic adaptations of the oyster to sessile life in a highly stressful environment.

Shell formation

Calcified shells provide critical protection against predation and desiccation in sessile marine animals such as oysters. Molluscan shells consist of calcium carbonate (CaCO3) crystals of either aragonite or calcite embedded in an elaborate organic matrix. Two models have been advanced for molluscan shell formation. The matrix model posits that mineralization occurs in a mantle-secreted matrix of chitin, silk fibroin and acidic proteins35, 36. Chitin and silk proteins are proposed to provide matrix structure, whereas acidic proteins control the nucleation and growth of CaCO3 crystals. The cellular model suggests that biomineralization is cell-mediated; that is, crystals are formed in haemocytes and then deposited at the mineralization front37.

We searched the oyster genome for genes implicated in shell formation in previous studies and examined their expression in different tissues and at different stages (Supplementary Text H1, 2). We also sequenced peptides from shells, mapped them to the genome and identified 259 shell proteins (Supplementary Text H3 and Supplementary Table 24). Although our search found evidence for the involvement of chitin, we did not find any silk-like proteins encoded in the oyster genome (Supplementary Text H2) but found, instead, many diverse proteins that may have roles in matrix construction and modification. Notably, a gene coding for a fibronectin-like protein was highly expressed at the early developmental stage, when larval shells are formed, in unison with chitin synthase (Fig. 4a) and was mostly expressed in the adult mantle (40× other organ average; Fig. 4b); the fibronectin-like protein was among the most abundant proteins found in oyster shells. Genes coding for laminin and some collagen proteins were also highly expressed in the mantle (Supplementary Fig. 27a) and found in shells. These are typical extracellular matrix (ECM) proteins, and their presence in shells suggests that the shell matrix has similarities to the ECM of animal connective tissues and basal lamina. Unlike silk fibroins that can self assemble38, the formation of fibronectin fibrils in the ECM is cell mediated39. Oyster fibronectin-like proteins have five type-III domains for integrin binding and cell adhesion. Genes coding for integrins were highly expressed in haemocytes (4× other organ average, Supplementary Fig. 27b). Thus, haemocytes may organize fibronectin-like fibril formation in the shell matrix as they do in ECM.

Figure 4: Genes related to shell formation identified from mass spectroscopy analysis of shell proteins and transcriptome data.
Genes related to shell formation identified from mass spectroscopy analysis of shell proteins and transcriptome data.

a, Relative expression (y axis) of genes coding for chitin synthase (gene CGI_10009438) and fibronectin-like (CGI_10016964) in early development corresponds to the formation of shell gland and first larval shells, as seen in scanning electron microscope photos. White arrow denotes the invagination that forms the shell gland. Developmental stages (x axis) and their timeline are defined in Supplementary Table 12. b, In adults, chitin synthase and fibronectin-like proteins (same colour as in a) are almost exclusively expressed in the mantle compared with other organs. Fibronectin-like is also one of the most abundant proteins found in the shell. c, Distribution of shell proteins in diverse Kyoto encyclopedia of genes and genomes (KEGG) pathways indicative of general cellular functions. d, Expression of 26 tyrosinase genes in the mantle edge, mantle pallial and other organs. Tyrosinases are abundant in shells and their higher expression in the non-pigmented mantle pallial indicate that their functions are not limited to melanogenesis but are related to shell formation.

The involvement of cells in shell formation is further supported by the functional diversity of proteins detected in shells. Many housekeeping proteins, such as elongation factor 1α and ribosomal proteins, were found in the shell; indeed, most oyster shell proteins are not structural proteins but are distributed in diverse metabolic pathways (Fig. 4c and Supplementary Table 25). This functional diversity of shell proteins mirrors that of cells, which is unexpected under the matrix model. Furthermore, 84% of the 259 shell proteins identified are not classical secreted proteins (Supplementary Text H3.4 and Supplementary Table 24); they may be part of cells or deposited by exosomes40. Supporting the presence of exosomes, 61 of the 259 shell proteins matched proteins in the exosome database41. Cells and exosome-like vesicles containing calcite crystals have been observed at the mineralization front37, 42, although their significance in shell formation is debated. This study provides molecular evidence for their presence inside shells and their probable participation in shell formation.

Many shell proteins are enzymes that may be involved in matrix construction or modification. A homologue of penicillin-binding protein is exclusively expressed in mantle (72× other organ average) and highly abundant in shells (Supplementary Fig. 27d). Penicillin-binding protein is a transpeptidase that crosslinks glycopeptides in bacterial cell walls43 and may have similar functions in the shell matrix. Another notable enzyme found is tyrosinase. The oyster genome has an expanded set of 26 genes coding for tyrosinase, compared with one in Caenorhabditis elegans and two in humans; most genes coding for tyrosinase are mantle specific (Fig. 4d) and highly enriched among shell proteins (P = 8×10−6). Although tyrosinase is a key enzyme in melanogenesis44, 45, it is most highly expressed in the non-pigmented pallial mantle (Fig. 4d), indicating that it has other functions in the oyster. The mantle secretes tyrosine-rich proteins46, and oxidation of tyrosine may be essential for shell matrix maturation. Several proteinases and proteinase inhibitors are highly mantle specific and abundant in shells, and may be involved in matrix formation, modification and protection (Supplementary Table 24). Together, these results indicate that oyster shell matrix is not formed simply by self-assembling silk-like proteins but by diverse proteins through complex assembly and modification processes that may involve haemocytes and exosomes.

Concluding remarks

We sequenced and assembled the genome of the Pacific oyster using an inbred individual, short-read NGS and a new fosmid-pooling and hierarchical assembly strategy. The draft assembly provided insight into a molluscan genome characterized by high polymorphism, abundant repetitive sequences and active transposable elements. Genomic, transcriptomic and proteomic analyses show unique adaptations of oysters to sessile life in a highly stressful intertidal environment and the complexity of shell formation. The oyster genome sequence and comprehensive transcriptome data provide valuable resources for studying molluscan biology and lophotrochozoan evolution, and for genetic improvement of oysters and other important aquaculture species.

Methods

The sequenced Pacific oyster is an inbred female produced by four generations of brother–sister mating. Genome sequences were produced with Illumina platform using fosmid pooling and assembled with a new hierarchical assembly strategy. Fosmid ends were sequenced by Sanger. Gene models were obtained by integrating results of de novo gene prediction and alignment-based methods based on homology and transcriptomic evidence. Transcriptomes were sequenced with Illumina platform. The proteome of the shell was obtained by mass spectrometry. All methods are described in detail in the Supplementary Information.

References

  1. Ponder, W. F. & Lindberg, D. R. in Phylogeny and Evolution of the Mollusca (eds Ponder, W. & Lindberg, D. R.) Ch. 1, 118 (Univ. of California Press, 2008)
  2. Walters, E. T. & Moroz, L. L. Molluscan memory of injury: evolutionary insights into chronic pain and neurological disorders. Brain Behav. Evol. 74, 206218 (2009)
  3. Talmage, S. C. & Gobler, C. J. Effects of past, present, and future ocean carbon dioxide concentrations on the growth and survival of larval shellfish. Proc. Natl Acad. Sci. USA 107, 1724617251 (2010)
  4. Caldeira, K. & Wickett, M. E. Anthropogenic carbon and ocean pH. Nature 425, 365365 (2003)
  5. Marin, F., Luquet, G., Marie, B. & Medakovic, D. Molluscan shell proteins: primary structure, origin, and evolution. Curr. Top. Dev. Biol. 80, 209276 (2008)
  6. Hedgecock, D. et al. The case for sequencing the Pacific oyster genome. J. Shellfish Res. 24, 429441 (2005)
  7. Sodergren, E. et al. The genome of the sea urchin Strongylocentrotus purpuratus. Science 314, 941952 (2006)
  8. Small, K. S., Brudno, M., Hill, M. M. & Sidow, A. Extreme genomic variation in a natural population. Proc. Natl Acad. Sci. USA 104, 56985703 (2007)
  9. Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311317 (2010)
  10. Xu, X. et al. Genome sequence and analysis of the tuber crop potato. Nature 475, 189195 (2011)
  11. Bonasio, R. et al. Genomic comparison of the ants Camponotus floridanus and Harpegnathos saltator. Science 329, 10681071 (2010)
  12. Dalloul, R. A. et al. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol. 8, e1000475 (2010)
  13. Star, B. et al. The genome sequence of Atlantic cod reveals a unique immune system. Nature 477, 207210 (2011)
  14. Harris, R. S. Improved Pairwise Alignment of Genomic DNA. PhD thesis, Pennsylvania State Univ. (2007)
  15. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 17541760 (2009)
  16. Kitzman, J. O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nature Biotechnol. 29, 5963 (2010)
  17. Suk, E. K. et al. A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res. 21, 16721685 (2011)
  18. Sauvage, C., Bierne, N., Lapègue, S. & Boudry, P. Single nucleotide polymorphisms and their relationship to codon usage bias in the Pacific oyster Crassostrea gigas. Gene 406, 1322 (2007)
  19. McGoldrick, D. J. & Hedgecock, D. Fixation, segregation and linkage of allozyme loci in inbred families of the Pacific oyster Crassostrea gigas (Thunberg): implications for the causes of inbreeding depression. Genetics 146, 321334 (1997)
  20. Hillier, L. W. et al. Whole-genome sequencing and variant discovery in C. elegans. Nature Methods 5, 183188 (2008)
  21. Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265272 (2010)
  22. Gaffney, P. M., Pierce, J. C., Mackinley, A. G., Titchen, D. A. & Glenn, W. K. Pearl, a novel family of putative transposable elements in bivalve mollusks. J. Mol. Evol. 56, 308316 (2003)
  23. Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res. 38, e199 (2010)
  24. Barucca, M., Olmo, E. & Canapa, A. Hox and paraHox genes in bivalve molluscs. Gene 317, 97102 (2003)
  25. Ferrier, D. E. K. & Holland, P. W. H. Ciona intestinalis ParaHox genes: evolution of Hox/ParaHox cluster integrity, developmental mode, and temporal colinearity. Mol. Phylogenet. Evol. 24, 412417 (2002)
  26. Goldstone, J. V. et al. The chemical defensome: environmental sensing and response genes in the Strongylocentrotus purpuratus genome. Dev. Biol. 300, 366384 (2006)
  27. Carland, T. M. & Gerwick, L. The C1q domain containing proteins: where do they come from and what do they do? Dev. Comp. Immunol. 34, 785790 (2010)
  28. Hanington, P. C. & Zhang, S. M. The primary role of fibrinogen-related proteins in invertebrates is defense, not coagulation. J. Innate Immun. 3, 1727 (2011)
  29. Zhang, S. M., Adema, C. M., Kepler, T. B. & Loker, E. S. Diversification of Ig superfamily genes in an invertebrate. Science 305, 251254 (2004)
  30. Kourtis, N. & Tavernarakis, N. Cellular stress response pathways and ageing: intricate molecular relationships. EMBO J. 30, 25202531 (2011)
  31. Lee, E. F. et al. Discovery and molecular characterization of a Bcl-2-regulated cell death pathway in schistosomes. Proc. Natl Acad. Sci. USA 108, 69997003 (2011)
  32. Bender, C. E. et al. Mitochondrial pathway of apoptosis is ancestral in metazoans. Proc. Natl Acad. Sci. USA 109, 49044909 (2012)
  33. Galtsoff, P. S. The American oyster Crassostrea virginica Gmelin. Fishery Bull. 64, 1480 (United States Govt Printing Office, 1964)
  34. Mosser, D. D., Caron, A. W., Bourget, L., Denis-Larose, C. & Massie, B. Role of the human heat shock protein hsp70 in protection against stress-induced apoptosis. Mol. Cell. Biol. 17, 53175327 (1997)
  35. Weiner, S., Traub, W. & Parker, S. Macromolecules in mollusc shells and their functions in biomineralization. Phil. Trans. R. Soc. Lond. B 304, 425434 (1984)
  36. Furuhashi, T., Schwarzinger, C., Miksik, I., Smrz, M. & Beran, A. Molluscan shell evolution with review of shell calcification hypothesis. Comp. Biochem. Physiol. B Biochem. Mol. Biol. 154, 351371 (2009)
  37. Mount, A. S., Wheeler, A. P., Paradkar, R. P. & Snider, D. Hemocyte-mediated shell mineralization in the eastern oyster. Science 304, 297300 (2004)
  38. Stark, M. et al. Macroscopic fibers self-assembled from recombinant miniature spider silk proteins. Biomacromolecules 8, 16951701 (2007)
  39. Lemmon, C. A., Chen, C. S. & Romer, L. H. Cell traction forces direct fibronectin matrix assembly. Biophys. J. 96, 729738 (2009)
  40. Keller, S., Sanderson, M. P., Stoeck, A. & Altevogt, P. Exosomes: from biogenesis and secretion to biological function. Immunol. Lett. 107, 102108 (2006)
  41. Mathivanan, S., Fahner, C. J., Reid, G. E. & Simpson, R. J. ExoCarta 2012: database of exosomal proteins, RNA and lipids. Nucleic Acids Res. 40, D1241D1244 (2012)
  42. Mount, A. S., Gohad, N. V., Hansen, D. C., Mueller, K. & Johnstone, M. B. Deposition of nanocrystalline calcite on surfaces by a tissue and cellular biomineralization. US patent 2010/0150982 A1. (2010)
  43. Sauvage, E., Kerff, F., Terrak, M., Ayala, J. A. & Charlier, P. The penicillin-binding proteins: structure and role in peptidoglycan biosynthesis. FEMS Microbiol. Rev. 32, 234258 (2008)
  44. Nagai, K., Yano, M., Morimoto, K. & Miyamoto, H. Tyrosinase localization in mollusc shells. Comp. Biochem. Physiol. B 146, 207214 (2007)
  45. Chang, T. S. An updated review of tyrosinase inhibitors. Int. J. Mol. Sci. 10, 24402475 (2009)
  46. Waite, J. H. in The Mollusca Vol. I (eds Hochachka, P. & Wilbur, K. M.) Ch. 11, 467504 (Academic, 1983)
  47. Zhang, G. et al. Genomic data from the Pacific oyster (Crassostrea gigas). GigaScience. http://dx.doi.org/10.5524/100030 (2012)

Download references

Acknowledgements

We acknowledge H. Wu, F. Zhang, Q. Tang, Z. Zhu, X. Xu, H. Lin, J. Lei, Z. Xiang, N. Li, J. Xiang and J. Jia for their support of the oyster genome project. We thank F. Han, X. Liu, R. Wu, L. Wang, Y. Wu, L. Yan, H. Niu, H. Li, Y. Wang, J. Liang, Z. Jia, J. Davis and Taylor Shellfish Farms for assistance with DNA, RNA and protein extraction, data analysis and oyster culture, and Y. Lu, C. Lin, H. Peng, Y. Ren, X. Xu, R. Chen and D. Zhang for library construction and sequencing. We thank L. Song, B. Z. Liu, Q. Li, Z. Yu, C. Ke, J. Yu, B. Liu, X. Sun, R. W. Chapman, Y. Han, S. R. Wessler, D. Arendt, E. H. Davidson, J. S. Evans, B. Brown, P. Boudry and B. Lieb for discussions. We thank other faculty and staff at the Institute of Oceanology, Chinese Academy of Science, BGI-Shenzhen and Rutgers who contributed to the oyster genome project. We acknowledge grant support from the National High-Technology Research and Development Program of China (863 program; 2010AA10A110), National Basic Research Program of China (973 Program; 2010CB126401 and 2010CB126402), 863 program (2012AA10A405), Basic Research Program Supported by Shenzhen City (JC2010526019), Shenzhen Key Laboratory of Transomics Biotechnologies (CXB201108250096A), National Natural Science Foundation of China (40730845), Mollusc Research and Development Center, CARS, Shenzhen Key Laboratory of Gene Bank for National Life Science, Taishan Scholar and Scholar Climbing Programs of Shandong. X.G. acknowledges funding from the US Department of Agriculture (2009-35205-05052 and NJ32108) and the Chinese Academy of Science Marine Functional Genomics Oversea Team and Taishan Scholar Fund; P.W.H.H. acknowledges funding from the European Research Council (EU FP7 ERC grant [268513]11); and J.P. acknowledges funding from Beatriu de Pinós of the Generalitat de Catalunya (2009 BP-DGR). We are grateful to Dalian Zhangzidao Fishery Group Co. Ltd for providing support.

Author information

  1. These authors contributed equally to this work.

    • Guofan Zhang,
    • Xiaodong Fang,
    • Ximing Guo,
    • Li Li,
    • Ruibang Luo,
    • Fei Xu,
    • Pengcheng Yang,
    • Linlin Zhang &
    • Xiaotong Wang

Affiliations

  1. Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China

    • Guofan Zhang,
    • Li Li,
    • Fei Xu,
    • Linlin Zhang,
    • Xiaotong Wang,
    • Haigang Qi,
    • Huayong Que,
    • Fucun Wu,
    • Jiafeng Wang,
    • Jie Meng,
    • Jun Liu,
    • Na Zhang,
    • Qihui Zhu,
    • Yishuai Du,
    • Shoudu Zhang,
    • Peizhou Cheng,
    • Juan Li,
    • Wei Wang,
    • Tong Wang,
    • Jibiao Zhang,
    • Yingxiang Li,
    • Jinpeng Wang,
    • Xiaorui Song,
    • Ronglian Huang,
    • Xuedi Du,
    • Mei Yang,
    • Zhicai She,
    • Wen Huang,
    • Baoyu Huang,
    • Tao Qu,
    • Guoying Miao,
    • Qiang Wang,
    • Haiyan Wang &
    • Xiao Liu
  2. BGI-Shenzhen, Shenzhen 518083, China

    • Xiaodong Fang,
    • Ruibang Luo,
    • Pengcheng Yang,
    • Zhiqiang Xiong,
    • Yinlong Xie,
    • Yabing Zhu,
    • Yuanxin Chen,
    • Chunfang Peng,
    • Lan Yang,
    • Bo Wen,
    • Zhiyong Huang,
    • Yue Feng,
    • Yunjie Liu,
    • Xiaoqing Sun,
    • Binghang Liu,
    • Xuanting Jiang,
    • Dingding Fan,
    • Wenjing Fu,
    • Bo Wang,
    • Zhiyu Peng,
    • Na Li,
    • Maoshan Chen,
    • Fengji Tan,
    • Qiumei Zheng,
    • Hailong Yang,
    • Li Chen,
    • Longhai Luo,
    • Yao Ming,
    • Shu Zhang,
    • Yong Zhang,
    • Peixiang Ni,
    • Junyi Wang,
    • Ning Li,
    • Guojie Zhang,
    • Yingrui Li,
    • Huanming Yang,
    • Jian Wang,
    • Ye Yin &
    • Jun Wang
  3. Haskin Shellfish Research Laboratory, Institute of Marine and Coastal Sciences, Rutgers University, Port Norris, New Jersey 08349, USA

    • Ximing Guo,
    • Yan He,
    • Shan Wang &
    • Lumin Qian
  4. HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory, Hong Kong

    • Ruibang Luo,
    • Yinlong Xie &
    • Binghang Liu
  5. Department of Zoology, University of Oxford, Oxford OX1 3PS, UK

    • Peter W. H. Holland &
    • Jordi Paps
  6. Department of Biological Sciences, Clemson University, South Carolina 29634, USA

    • Andrew Mount
  7. Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, USA

    • Dennis Hedgecock
  8. Atlantic Cape Community College, Mays Landing, New Jersey 08330, USA

    • Zhe Xu
  9. Laboratory of Evolutionary Genetics, Ruđer Bošković Institute, Bijenička cesta 54, P.P. 180, HR-10002, Zagreb, Croatia

    • Tomislav Domazet-Lošo
  10. School of Marine Science and Policy, University of Delaware, Lewes, Delaware 19958, USA

    • Patrick M. Gaffney
  11. Institute of Biology, Humboldt Universität zu Berlin Arboretum, Späthstraße 80/81, 12437 Berlin, Germany

    • Christian E. W. Steinberg
  12. Department of Biology, University of Copenhagen, DK-2200 Copenhagen, Denmark

    • Jun Wang
  13. The Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, DK-2200 Copenhagen, Denmark

    • Jun Wang

Contributions

G.Z. and X.G. conceived the study and designed scientific objectives. G.Z., Jun W. and X.G. led the project and manuscript preparation. Jun W., X.F. and Y.Y. developed the sequencing strategy. L. Li and X.F. managed the project. R.L. (leader), Yr.L., Z.H., Y.L., Xq.S., B.L., X.J., W.F., Qm.Z., H.Y., L. Luo, B. Wang, Y.M. and P.N. conducted assembly and evaluation; X.F. (leader), P.Y., Zq.X., Y.X., Yb.Z., Y.C., C.P., Y.F., D.F., L.Y., Z.P., Na L., X.W., M.C., L.C., S.Z., Jy.W., Ning L., Gj.Z. and Yr.L. performed genome annotation and data analysis; L. Li, F.X., Hy.Q., F.W., Sd.Z., Jp.W., X.D., J.Z., Q.W. and L.Q. cultured oysters and provided materials; Hg.Q. (leader), L. Li, Jf.W., Z.S. and H.W. performed polymorphism analysis and validation; F.X. (leader), P.W.H.H., J.P., T.D.L., P.Y., J. Liu, X.W., L. Li, N.Z., J. Li, W.W., Yx.L., M.Y. and W.H. conducted developmental biology studies and data analysis; L.Z. (leader), X.G., J.M., Qh.Z., Y.D., C.E.W.S., P.C., B.H., T.Q. and G.M. conducted stress studies and data analysis; X.W. (leader), X.G., T.W., Z. Xu, Y.H., A.M., Xr.S., R.H., B. Wen, F.T. and Y.Z. conducted shell-formation studies and data analysis. Hm.Y. and Jian W. supervised sequencing, assembly and bioinformatics analysis. X.G., S.W. and F.X. performed flow-cytometry analysis. D.H. and P.M.G. provided inbred oysters, BAC sequences and advice. L.Q. and X.L. participated in discussions and provided suggestions. X.G., X.F., L. Li, R.L., F.X., P.Y., L.Z., X.W., Hg.Q. and P.W.H.H. did most of the writing with contributions from all authors.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

The oyster genome project has been deposited at DDBJ/EMBL/GenBank under the accession number AFTI00000000. All short-read data have been deposited into the Sequence Read Archive (SRA) (http://www.ncbi.nlm.nih.gov/sra) under the accession number SRA040229. Short-read data of re-sequencing have been deposited in the SRA under the accession number SRA043580. Raw sequencing data of transcriptomes have been deposited in the Gene Expression Omnibus under the accession number GSE31012. Genomic data are also available at the Comprehensive Library for Modern Biotechnology (CLiMB) repository: doi:10.5524/100030 (ref. 47).

Author details

Supplementary information

PDF files

  1. Supplementary Information (7.6M)

    This file contains Supplementary Text and Data, which includes Supplementary Materials, Methods and Results (see Contents list for details), Supplementary Figures 1-29, Supplementary Tables 1-13, 15- 20, 23, 25, and 28-29 (see separate zipped file for Supplementary Tables 14, 21, 22, 24, 26 and 27).

Zip files

  1. Supplementary Tables (14.4M)

    This file contains Supplementary Tables 14, 21, 22, 24, 26 and 27.

Additional data