Adaptation to deep-sea chemosynthetic environments as revealed by mussel genomes

Hydrothermal vents and methane seeps are extreme deep-sea ecosystems that support dense populations of specialized macrobenthos such as mussels. But the lack of genome information hinders the understanding of the adaptation of these animals to such inhospitable environments. Here we report the genomes of a deep-sea vent/seep mussel (Bathymodiolus platifrons) and a shallow-water mussel (Modiolus philippinarum). Phylogenetic analysis shows that these mussel species diverged approximately 110.4 million years ago. Many gene families, especially those for stabilizing protein structures and removing toxic substances from cells, are highly expanded in B. platifrons, indicating adaptation to extreme environmental conditions. The innate immune system of B. platifrons is considerably more complex than that of other lophotrochozoan species, including M. philippinarum, with substantial expansion and high expression levels of gene families that are related to immune recognition, endocytosis and caspase-mediated apoptosis in the gill, revealing presumed genetic adaptation of the deep-sea mussel to the presence of its chemoautotrophic endosymbionts. A follow-up metaproteomic analysis of the gill of B. platifrons shows methanotrophy, assimilatory sulfate reduction and ammonia metabolic pathways in the symbionts, providing energy and nutrients, which allow the host to thrive. Our study of the genomic composition allowing symbiosis in extremophile molluscs gives wider insights into the mechanisms of symbiosis in other organisms such as deep-sea tubeworms and giant clams. Genome sequences of a deep-sea vent and a shallow-water mussel species are compared. The former has expanded gene families for protein stabilization and removal of toxic substances and has a more complex immune system.

T he environment of deep-sea hydrothermal vents and methane seeps is characterized by darkness, lack of photosynthesisderived nutrients, high hydrostatic pressure, variable temperatures and high concentrations of heavy metals and other toxic substances 1,2 . Despite this hostile environment, these ecosystems support dense populations of macrobenthos which, with the help of chemoautotrophic endosymbionts, are fuelled by simple reduced molecules such as methane and hydrogen sulfide 1-3 . Studying vent and seep organisms may answer fundamental questions about their origin and adaptation to the extreme environments 1,4 . Deep-sea mussels (Mytilidae, Bathymodiolinae) often dominate at hydrothermal vents and cold seeps around the world. Owing to their ecological importance and remarkable biological characteristics, including their ability to survive an extended period under atmospheric pressure 5 , they are a recognized model for studies of symbiosis 6 , immunity 7 , adaptation to abiotic stress 8 , ecotoxicology 9 and biogeography 10 .

results and discussion
We sequenced the genomes of both B. platifrons Hashimoto and Okutani, 1994 (a deep-sea mussel, Fig. 1a) and M. philippinarum (Hanley, 1843) (a shallow-water mussel, Fig. 1b) in the family Mytilidae using a whole-genome shotgun approach and compared their features (Supplementary Note 1). Both assembled genomes were highly repetitive ( Supplementary Fig. 2). The genome of B. platifrons (1.64 Gb) was smaller than that of M. philippinarum (2.38 Gb), mainly because it has fewer repeats, particularly the Helitrontransposable elements and unclassified repeats that together contributed a 707 Mb difference between the two genomes (Supplementary Note 9). The N 50 value (the shortest sequence length at 50% of the genome size) of the B. platifrons scaffolds and contigs was 343.4 Kb and 13.2 Kb, respectively; and the N 50 value of the M. philippinarum scaffolds and contigs was 100.2 Kb and 19.7 Kb, respectively. The deep-sea and shallow-water mussel genome contained 96.3% and 93.7% complete and partial universal single-copy metazoan orthologous genes, respectively, indicating the completeness of the assembly and gene models (Supplementary Note 6). As with other bilaterians, the two mussel genomes contained all the expected Hox and ParaHox genes, as well as similar numbers of microRNAs, again suggesting that the assemblies encapsulate a nearly complete representation of genic information (Supplementary Note 13 and 14). Although the gills of B. platifrons contain endosymbiotic methane-oxidizing bacteria (MOB) 11 , the host scaffolds did not contain bacterial nucleotide sequences, showing a lack of horizontal gene transfer from the symbiont. B. platifrons has a lower rate of heterozygosity compared to the other three bivalves with a sequenced genome ( Supplementary  Fig. 3a), potentially owing to recurrent population bottlenecks as a result of population extinction and re-colonization 12 .
The B. platifrons genome had 33,584 gene models, 86.2% of which had transcriptome support, based on transcriptome data from three Articles NATure ecOlOgy & evOluTiON tissues in a previous study 13 , and our new transcriptome data from three additional tissues; whereas the M. philippinarum genome had 36,549 gene models, 86.0% of which had transcriptome support, based on our new transcriptome data from five tissues (Supplementary Note 5). A total of 5,527 B. platifrons gene models were validated by shotgun proteomics using gill samples (false discovery rate (FDR) < 0.01), and the gene-expression level correlated well (P < 1 × 10 −5 ) with their corresponding protein abundance (Supplementary Note 20). Comparisons among five molluscan genomes revealed 6,883 shared protein domains and 474 protein domains unique to B. platifrons (Fig. 2a and Supplementary Table 14).
A phylogenetic tree constructed using 375 single-copy orthologues from 11 lophotrochozoan species (Fig. 1c) and another tree constructed using transcriptome sequences of representative members of Mytilidae (Fig. 1d) showed that, within the genera used, Modiolus is the closest shallow-water relative of Bathymodiolus. Using the lophotrochozoan tree as a reference, the time of divergence between B. platifrons and M. philippinarum was estimated to be around 110.4 million years ago (Ma), with a 95% confidence interval of 52.4-209.7 Ma (Fig. 2b), which is close to the upper age limit of deep-sea symbiotic mussels (102 Ma) previously estimated using five genes 14 . This result supports the hypothesis that the ancestors of modern deep-sea mussels might have experienced an extinction event during the global anoxia period that has been associated with the Palaeocene-Eocene thermal maximum at around 57 Ma 15 .
Gene-family analysis among the 11 lophotrochozoans revealed the expansion of 111 protein domains and contraction of 39 domains in B. platifrons ( Fig. 2c and Supplementary Table 10). The number of contracted domains in the deep-sea mussel is the smallest among all the species examined. This indicated that adaptation of B. platifrons to the deep-sea chemosynthetic environment was mainly mediated by gene family/domain expansion, and the deepsea mussel has retained most genes of its shallow-water ancestors.
Among the most numerously expanded gene families in B. platifrons are the HSP70 family (179 proteins) and the ABC transporters (393 proteins). HSP70 is important for protein folding, and thus high expression of these genes in the gill (Supplementary Fig. 15) could indicate their active response to stresses such as changing temperature, pH and hypoxia. However, because it took approximately three hours from the sample collection to preservation, the high expression of some of the stress-related genes could also be caused by the stress experienced during the sampling and transportation. Nevertheless, the expression level of HSP70 has been reported to positively correlate with the level of DNA-strand breakage in a deep-sea mussel 16 . Therefore, expansion of this gene family may provide additional genetic resources allowing deep-sea mussels to cope with abiotic stressors. ABC transporters are known to move toxic chemicals outside mussel gill epithelial cells 17 , forming the first line of defence against toxic chemicals in the vent/seep fluid. Many of the ABC transporters were highly expressed in the gill ( Supplementary Fig. 16), indicating their active role in protecting the deep-sea mussel through detoxification.
Deep-sea mussel Bathymodiolus spp. obtain their symbionts from the environment, and house them in the vacuoles of gill epithelial cells 18 . Therefore, they must recognize the bacteria in the water column, incorporate them into the host cell, and control their population growth (Fig. 3). The target microbes carry signature molecules on their surface, which are recognized by the transmembrane pattern-recognition receptors (PRRs) of the host 19 . Among the PRRs in B. platifrons, there was a significant expansion of genes with an immunoglobin (Ig) domain and genes with an fibrinogen domain, both of which are critical in allowing the sea anemone Aiptasia to recognize its dinoflagellate symbiont 20 . In addition, peptidoglycan-recognition proteins (PGRPs) that can bind with peptidoglycan 21 on the bacterial cell wall were also expanded in the genome of B. platifrons. Many of these PRR genes were also highly a b  (1) The Toll-like receptor 13 (TLR13) was found to be expanded in the genome of B. platifrons, and these receptors were highly expressed in the gill ( Supplementary Fig. 20). TLR13 functions as an endosomal receptor in various groups of animals, which can sense microbes only after their internalization by phagocytosis 22 .
(2) Two adhesion gene families (syndecan and protocadherin) were expanded in the genome of B. platifrons, and these families have been reported to mediate endocytosis in other species 23 Among the positively selected genes, a Wiskott-Aldrich syndrome protein and SCAR homologue (WASH) functions as a nucleationpromoting factor (NPF) to stimulate actin polymerization on endosomal membranes and consequently regulate vesicular trafficking 25 .
Apoptosis plays multiple functions in host-symbiont interactions such as post-phagocytic winnowing in coral-dinoflagellate symbiosis 26 and symbiont recycling in insect-bacteria symbiosis 27 . In B. platifrons, several families of the classic apoptosis system including TNF, DEATH, CARD and caspases were remarkably expanded and were highly expressed in the gill (Table 1 and Supplementary Note 18). As the central controllers of programmed cell death, caspases initiate the transduction of apoptotic signals by activation of members of the TNF superfamily and their DEATH receptor 28 , which can eventually result in the release of lysozymes to digest the symbionts 29 . A TUNEL assay showed the presence of numerous apoptotic cells in deep-sea mussels 30 and in late-juvenile stages of vent tubeworms when they have acquired symbionts 31 , further substantiating the evolutionarily conserved role of apoptosis in establishing the host-symbiont relationship in various groups of invertebrates. In addition, as required to maintain homeostasis in symbiosis, B. platifrons has a complicated anti-apoptosis system with 130 inhibitors of apoptosis proteins (IAPs), which exceeds the number of IAPs in any other lophotrochozoan species available for comparison (Fig. 4). The high expression of many IAP genes is consistent with their role in maintaining the symbiont population in the gill of B. platifrons (Fig. 4b). Therefore, the expansion of the apoptosis-and anti-apoptosis-related gene families could contribute to the regulation of the symbiont population in Bathymodiolus.
To better understand the relationship between the host and its symbionts, 220 MOB proteins were identified through metaproteomics of the B. platifrons gill (Supplementary Note 20), leading to the discovery of three key metabolic pathways in the symbionts (Fig. 3). The methane-oxidation pathway, indicated by high expression of two signature enzymes of methanotrophy (that is, methane monooxygenase and methanol dehydrogenase) 11 , fuels the host-symbiont relationship ( Supplementary Fig. 28). The assimilatory sulfate reduction pathway, identified by several adenylylsulfate kinases, produces sulfur-containing amino acids 32 . The ammonia-assimilation pathway, identified by glutamate-ammonia ligase and glutamate synthase, detoxifies ammonia and provides the symbionts and host with the much needed organic nitrogen in the deep sea 11 .

NATure ecOlOgy & evOluTiON
Our study has provided genomic resources for understanding how the deep-sea mussel has adapted to extreme abiotic stresses and absence of photosynthesis-derived energy and nutrients in the chemosynthetic ecosystems. The general mechanisms of symbiosis revealed in our study are of relevance to other symbiotic organisms such as deep-sea tubeworms and clams, as well as shallow-water corals.

Methods
Mussel collection and DNA extraction. Individuals of B. platifrons were collected using the manned submersible Jiaolong from a methane seep (22° 06.921′ N, 119° 17.131′ E, 1,122 m deep; Supplementary Fig. 1) in the South China Sea in July 2013 13 . The mussels were kept in an enclosed sample chamber placed in the sample basket of the submersible, and it took approximately three hours from sampling to sample preservation. Once the samples were brought to the upper deck of the mothership, the adductor muscle of an individual was dissected, cut into small pieces and immediately fixed in RNAlater. The sample was then transported to Hong Kong Baptist University (HKBU) on dry ice and stored at − 80 °C until use. DNA was extracted using the CTAB method, and its quality and quantity was checked with standard agarose gel electrophoresis and a Qubit Fluorometer, respectively.
The shallow-water horse mussels M. philippinarum were collected from a sandy shore at Ting Kok, Hong Kong (22° 46.989′ N, 114° 21.389′ E; Supplementary Fig. 1) during low tide in April 2014. The mussels were transported to HKBU and maintained in a seawater aquarium until use. The species identity was confirmed based on a morphological description 33 . DNA from the adductor muscle was extracted using the DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany).
Genome sequencing and reads cleaning. For B. platifrons, nine libraries with insert sizes ranging from 180 bp to 16 kb were constructed using different preparation methods (see Supplementary Table 1a for details) and sequenced by Illumina Hiseq2000 and Hiseq2500 platforms. For M. philippinarum, six libraries with insert sizes ranging from 180 bp to 5 kb were constructed and sequenced by Illumina Hiseq2000 and Hiseq2500 platforms. Low-quality reads and sequencing-adaptor-contaminated reads were trimmed and error corrected. Reads of mate-pair libraries were de-duplicated and sorted.
Transcriptome sequencing. For B. platifrons, the mantle, foot and gill transcriptomes were generated from a single B. platifrons as previously reported 13 . The adductor muscle, visceral mass and ovary tissues from another B. platifrons collected from a hydrothermal vent at the Noho site in November 2015 were dissected on board and fixed in RNAlater, and their RNA was extracted using TRIzol and sent to BGI (Shenzhen) for sequencing using a eukaryotic mRNA enrichment and library preparation protocol. For M. philippinarum, RNA was extracted from the mantle, foot, gill, adductor muscle and visceral mass and sequenced using the same method as for B. platifrons. The RNA-sequencing (RNA-seq) data are summarized in Supplementary Table 5. Raw reads were trimmed and assembled using Trinity version 2.0.6 34 .
Genome assembly. The genomes were assembled using Platanus version 1.21 35 . Our genome-assembly pipeline involved assembling of DNA sequences using different parameters, merging results from different assemblies, removing redundant scaffolds, further scaffolding using transcripts, filling the gaps and correcting the errors (Supplementary Fig. 4).

Identification of repeats and transposable elements.
Repeats and transposable elements were screened using the RepeatModeler pipeline version 1.0.8. RMblast was used to search and construct a species-specific repeats library of the genome, and RepeatMasker 36 was used to identify, classify and mask the repeats with the species-specific repeats library and repeats in the RepBase with the default settings.
Genome annotation. Trinity was used to generate another version of transcripts using the genome-based mode. The transcriptome data from the de novo and genome-assisted approaches were further assembled using PASA pipeline version 2.0.2 37 with BLAT as the aligner. The two mussel genomes were annotated using MAKER version 3.0 38 . MAKER was run twice, in order to train the gene models and increase the gene-predication accuracy. The gene models were further filtered by running OrthoMCL 39 with the protein models from six species of mollusks (Crassostrea gigas, Pinctada fucata, Lottia gigantea, Aplysia californica, Biomphalaria glabrata, and Octopus bimaculoides) and sequences that could be clustered with these molluscs were retained. Transcriptome reads were then mapped to the remaining 'orphan' genes, and only those supported by at least 10 parts per million (ppm) reads were retained. BLASTp was applied to search the predicted mussel protein models against the whole non-redundant database with an E value of 1 × 10 −5 . The.xml format files were uploaded to BLAST2GO to obtain the GO and InterprotScan annotation. Pfam annotation was performed by searching the 16,295 Pfam entries (version 29.0) using a hidden Markov model (HMM) 40 . The clean RNA-seq reads were quantified using kallisto 41 . Gene-expression levels were quantified based on transcripts per million (TPM). Table 3) were downloaded and analysed together with data from the two mussels used in the phylogenomic analysis. OrthoMCL 39 was applied to determine and cluster gene families, and the BBH approach in BLASTp was used with a threshold value of 1 × 10 −5 . This analysis resulted in 35,057 gene families among the 11 species. Only single-copy orthologues in each gene cluster were concatenated without partition information for the phylogenetic analysis using RaxML 42 with the GTR + Γ model and slow bootstrapping.

Phylogenomics. Protein sequences of the nine published or public lophotrochozoan genomes (Supplementary
Divergent time among the mollusks was estimated using MCMCTree, which applied the Bayesian estimation method for species divergence 43 . The maximumlikelihood tree obtained using the 11 lophotrochozoan species was used to provide a reference tree topology, and the WAG + Γ protein substitution model was employed on each site. The tree was calibrated with the following time frames to constrain the age of the nodes between the species: minimum = 305.5 Ma and soft maximum = 581 Ma, for C. teleta and H. robusta 44 ; minimum = 168.6 Ma and soft  To understand the phylogenetic positions of these two mussels in the family Mytilidae better, transcriptome sequences from six additional mytilid species were downloaded from NCBI and re-assembled where appropriate, with details shown in the Supplementary Information. Protein sequences from each species were grouped into gene families using OrthoMCL 39 as described above. Only the one-to-one orthologues were aligned, trimmed and concatenated. Maximum-likelihood trees were built using RaxML 42 with the GTR + Γ model and slow bootstrapping. Gene-family evolution. We applied a sensitive HMM scanning method on known pfam functional protein domains to classify the gene families 46 . Annotation of Pfam domains were performed on all 11 lophotrochozoan species after excluding protein domains of various transposable elements. A domain with multiple copies in a protein was counted only once. Thereafter, in order to identify gene-family expansion/contraction events in B. platifrons, a two-tailed Fisher's exact test was conducted. The Pfam domain counts in B. platifrons were compared against the background average domain counts of the 9 non-mussel mollusks (that is, species other than B. platifrons and M. philippinarum). In addition, OrthoMCL 39 was applied to cluster gene families and CAFÉ version 3.1 was applied to determine the significance of gene family expansion/contraction 47 . For each gene family, CAFÉ generated a family-wide P value, with a significant P value indicating a possible gene-family expansion or contraction event. A branch-specific P value was also generated for each branch/node using the Viterbi method. A family-wide P value less than 0.01 and a branch/node Viterbi P value less than 0.001 was considered as a signature of gene family expansion/contraction for a specific gene family and specific species, respectively, as recommended for CAFÉ version 3.1 in ref. 47 .

Proteomics characterization.
Gill samples from three individuals of B. platifrons fixed separately in RNAlater were used in proteomic analysis 48 . In brief, the samples were homogenized in a lysis buffer containing 8 M urea and 40 mM HEPES (pH = 7.5), sonicated to break the cells up, centrifuged at 4 °C at 15,000g for 15 min. The supernatant that contained the proteins was collected, purified with a methanol/chloroform protein-precipitation method, and re-constituted in lysis buffer. Three samples were then pooled in equal amounts, reduced using dithiothreitol, alkylated using iodoacetamide, and digested using sequencing-grade Trypsin. The digested sample, containing approximately 1 mg peptides, was fractionated using a strong cation-exchange (SCX) column following the procedure described previously 48 .
Each SCX fraction was re-dissolved in 0.1% formic acid and analysed on a LTQ-Orbitrap Elite mass spectrometer coupled to an Easy-nLC (Thermo Fisher, Bremen, Germany) equipped with an Acclaim PepMap RSLC C18 column (Thermo Fisher Scientific, Sunnyvale, California, USA).
The MS/MS data were uploaded into Mascot version 2.3.2 (Matrix Sciences, London, UK) to search against the deduced B. platifrons protein sequences, with their reversed sequences used to calculate the FDR. The searching parameters were as follows: peptide charge: + 2, + 3 and + 4; fixed modification: carbamidomethyl (C); variable modification: oxidation (M) and deamidated (NQ); maximum missed cleavage: 2; peptide mass tolerance: 4 ppm; and fragment mass tolerance: 0.05 Da. The FDR was dynamically controlled at 0.01.
Because the gills of B. platifrons contain MOB 11,49 that belong to the family Methylococcaceae 13 , we also identified and quantified the bacterial proteins in the gill samples to gain insights into the symbiotic relationship between the bacteria and their host. Protein sequences from 24 published bacterial genomes in Methylococcaceae were downloaded and appended with the 1,402 bacterial that were identified in our previous meta-transcriptome analysis 13 . This created a database with 86,023 bacterial proteins. The database search criteria were identical to those described previously for the mussel proteins. The identified/matched protein sequences were further annotated by BLASTp against the KEGG database using an E value of 1 × 10 −7 .
Determination of positively selected genes. Single-copy orthologues were selected among B. platifrons, M. philippinarum, C. gigas and P. fucata after running the OrthoMCL pipeline 39 as mentioned above. For each single-copy orthologue group, they were aligned, trimmed and concatenated. The phylogenetic tree was constructed using RaxML 42 with the GTR + Γ model and rapid bootstrapping.
Signatures of positively selected genes along a specific branch can be detected by branch-site models implemented in the codeml module of the phylogenetic analysis by maximum likelihood (PAML) package version 4.8 50 .
The branch-site model requires a lineage designated as 'foreground' phylogeny (that is, B. platifrons in the former phylogenetic tree) and the rest of the lineages as 'background' phylogeny (that is, M. philippinarum, C. gigas, and P. fucata in the former phylogenetic tree). Alternative model (model = 2, NSsite = 2, fix_omega = 0) and null hypothesis (model = 2, NSsite = 2, fix_omega = 1 and omega = 1) likelihood values were tested using χ 2 analysis. Only genes that had Bayesian Empirical Bayes (BEB) sites > 90 % and a corrected P value less than 0.1 were considered to have undergone positive selection. A KEGG-pathway enrichment was performed with a χ 2 test (if any of the components are over 5) or the Fisher's exact test (all other analyses).
Data availability. Raw reads, assembled genome sequences and annotation are accessible from NCBI under BioProject numbers PRJNA328542 and PRJNA328544, Sequence Read Archive accession numbers SRP078287 and SRP078294, and Whole Genome Shotgun project numbers MJUT00000000 and MJUU00000000. The genome assemblies and annotations are also available from the Dryad Digital Repository at http://dx.doi.org/10.5061/dryad.h9942.