The Lingula genome provides insights into brachiopod evolution and the origin of phosphate biomineralization

The evolutionary origins of lingulid brachiopods and their calcium phosphate shells have been obscure. Here we decode the 425-Mb genome of Lingula anatina to gain insights into brachiopod evolution. Comprehensive phylogenomic analyses place Lingula close to molluscs, but distant from annelids. The Lingula gene number has increased to ∼34,000 by extensive expansion of gene families. Although Lingula and vertebrates have superficially similar hard tissue components, our genomic, transcriptomic and proteomic analyses show that Lingula lacks genes involved in bone formation, indicating an independent origin of their phosphate biominerals. Several genes involved in Lingula shell formation are shared by molluscs. However, Lingula has independently undergone domain combinations to produce shell matrix collagens with EGF domains and carries lineage-specific shell matrix proteins. Gene family expansion, domain shuffling and co-option of genes appear to be the genomic background of Lingula's unique biomineralization. This Lingula genome provides resources for further studies of lophotrochozoan evolution.


Supplementary Notes
Supplementary Note 1: Background information, materials and methods

Background information of the brachiopod Lingula anatina
Although superficially resembling mussels, lingulid (i.e., tongue-shaped) brachiopods, including Lingula anatina, have several unique features that distinguish them from bivalves. These include flexible, dorso-ventral shells made of calcium phosphate without hinges, chitinous chaetae on the mantle margins, two arms lined with ciliated tentacles (i.e., lophophores) for filter feeding, and a tail-like structure (i.e., pedicle) to attach to hard substrate 52,53 (Fig. 1a). In addition, their early embryonic development is like that of basal deuterostomes 54  In order to answer some of the questions mentioned above, we sequenced the genome, transcriptomes, and shell proteome of the lingulid brachiopod, Lingula anatina. Using these data, we applied comparative genomic analyses to provide insights into the evolutionary history of this lophotrochozoan and the origin of phosphate biomineralization (A full list of proteomes and genomes of selected metazoans used in this study is shown in Supplementary Table 1).

Biological materials
Gravid Lingula adults (Fig. 1a) were collected during July to August in Kasari Bay, Amami Island (28.440583 N 129.667608 E; Supplementary Fig. 1a). The tissues were then completely homogenized with a Polytron handheld homogenizer.
Homogenized samples were centrifuged and supernatants were used for RNA extraction with the standard TRIzol protocol. The RNA quality of both embryonic and adult extracts was checked with an Agilent Technologies 2100 Bioanalyzer using an Agilent RNA 6000 Nano Kit.

Genome sequencing, assembly, gene modeling, and annotation
Sequencing, assembly and annotation of the Lingula anatina genome are shown in Supplementary Fig. S2. To avoid contamination with environmental microbes, we extracted genomic DNA from a gravid male gonad (i.e., mostly sperm cells). We sequenced the Lingula genome with next-generation sequencing (NGS) with a hybrid approach using three different  80 , and upgraded scaffolds were produced with SSPACE-LongRead (v1-1) 81 (Supplementary Table 2). Gaps in the scaffolds were then filled using GapCloser (v1.12-r6) from the SOAPdenovo2 package (r240) 82 ( Supplementary Fig. 2a). Redundancy of the final scaffolds was removed using a custom Perl script (calculating BLASTN alignment length and identity, Chuya Shinzato, personal communication) 14 . After gap closing, there were 17.5 Mb of gaps (4.1%) in the final assembly ( Supplementary Fig. 2d). The estimated size of the Lingula genome was approximately 463 Mb, based on flow cytometry ( Supplementary Fig. 1b). To estimate the genome size further, we  Fig. 1d).
Heterozygosity rate of the Lingula genome was 1.6% based on calculation of the ratio of homozygous and heterozygous peaks ( Supplementary Fig. 1d), meaning that SNP occurs about once every 62 bp. This ratio is higher than that of humans (0.043%). Regions of repetitive sequences were identified with RepeatScout (v1.0.5) 84 and then masked with RepeatMasker (v4.0.3) 85 . The Lingula genome is less repetitive (22.2%) than the pearl oyster genome 10 (Supplementary Fig. 2d; see also Supplementary Note 3.5). The Lingula genome shows low GC content (36.3%), which is similar to mollusc genomes ( Supplementary Fig. 1e,f). The final assembly size of the Lingula genome was 425 Mb, which was in the range predicted by two different types of estimate. Based upon this genome size, we sequenced the Lingula geneome with approximately 226-fold coverage.
The quality and completeness of the genome assembly were assessed by searching for the set of 248 core eukaryotic genes using CEGMA (v2.4.010312) 86   To obtain high quality gene models, we performed deep mRNA sequencing (RNA-seq) to obtain transcript information ( Supplementary Figs 2b and 4). Gene models were predicted with trained AUGUSTUS (v3.0.2) 89 using hints from spliced alignment of transcripts to the masked genome assembly produced with BLAT 87 and PASA (r20130907) 90 (Supplementary Fig. 2c).
There are 34,105 gene models predicted from the repeat-masked genome, which is higher than other lophotrochozoans 8,9 ( Supplementary Fig. 1d). A BLAST top-hits search against the NCBI nr database using BLAST+ (v2.2.29+; e-value, 1e-5) 91 and Blast2GO 92 shows that 28% of the Lingula gene models have best hits among molluscs, implicating a close relationship between Lingula and molluscs ( Supplementary Fig. 5). On the other hand, 21% of the genes show no hits to known sequences, suggesting these genes may specifically pertain to the Lingula lineage ( Supplementary Fig. 5); however, we cannot exclude the possibility of overestimation in which gene model errors may contribute to this estimate. In agreement with BLAST top-hits results, Lingula has an average gene size of 6.7 kb with 6.6 introns per gene ( Supplementary Fig. 2d), which is closer to the sea snail, Lottia, than to the leech, Helobdella, or the polychaete, Capitella 8 .

Transcriptome analysis of embryos and adult tissues
To study the spatiotemporal expression of genes, RNA-seq of 369 M read pairs from embryos and adult tissues was conducted with an Illumina HiSeq 2500 (Supplementary Fig. 4 and Supplementary Table 4). Transcripts were assembled de novo with Trinity (r2013_08_14, a de Bruijn graph-based program) 93,94 and used as an expression evidence for gene model prediction ( Supplementary Fig. 2b). In addition, to allow the transcriptome more accessible for downstream analysis, we eliminated transcript assemblies that contained computation errors, expressed at extremely low levels, and expressed with highly similar isoforms. After RNA-seq assembly, we mapped back all Q20 reads from each embryonic stage and adult tissue using Bowtie (v2.1.0) 95 , followed by estimation of the transcript abundance using RSEM (v1.2.5) 96 . We filtered transcripts using the criteria that the expression level of fragments per kilobase of transcript per million mapped reads (FPKM) lower than one and isoform appearance less than 5%. In addition, redundant isoforms were removed with CD-HIT (v4.6) 97 using 95% identity as a criterion. This step removed 61% of the transcripts from the primary assembly ( Supplementary Fig. 4b Fig. 4b,c). In order to assess the quality of the transcriptome assembly, we applied full-length transcript analysis using a bundled Perl script "analyze_blastPlus_topHit_coverage.pl" in the Trinity package 94 . The Lingula transcriptome is of high quality in terms of the number of full-length transcripts, which is comparable to the best annotated animals and is the best when selected gene models and the transcriptome are compared ( Supplementary Fig. 4d).

Proteomic analyses of shell matrix proteins
To provide insights into biomineralization in one of the most ancient animals, we used a proteomic approach to study the Lingula shell matrix. The shell of one Lingula adult was dissected and stirred in 12.5% NaClO. Soft tissue remaining on the shells was removed with Milli-Q water. The cleaned shell was mechanically crushed and ground into fine powder, and then treated with 12.5% NaClO to remove remaining contaminants. In order to remove minerals On the other hand, the insoluble pellet from the acetic acid solution, which contained acid insoluble proteins (AIP), was rinsed with Milli-Q water and then re-suspended in reducing buffer.
ASP and AIP samples were mixed with sample buffer, respectively, and subjected to SDS-PAGE.
Extracted shell matrix proteins (SMPs) were resolved in a 10-20% gradient gel, visualized with SimplyBlue SafeStain and SYPRO Ruby staining. Afterward, protein bands were excised and ingel digested with trypsin. Peptides were identified with LC-MS/MS performed as previously described 102 .
In brief, digested peptides were analyzed using a capillary liquid chromatography system (Dionex, UltiMate 3000) connected to a mass spectrometer (Thermo Scientific, LTQ-XL). Raw spectra were processed using SEQUEST software to extract peak lists 103 . Resulting peak lists were analyzed using an in-house MASCOT (v2.3.2) server against Lingula predicted gene models. We did not apply the "two-peptide" rule here since we noticed that this approach introduces bias and often leads to loss of information 104 . Instead, peptide-hits were qualityfiltered using ion score significance thresholds (>45; i.e., false discovery rate, FDR < 0.05), and high-quality one-peptide hits were retained.
Lingula SMPs were first analyzed in regard to molecular weight, theoretical pI, and amino acid composition with ProtParam 105 using a custom Python script with the module, "Bio.SeqUtils.ProtParam" from Biopython. They were then searched against the NCBI nr database by BLASTP to assign homology and possible function. Furthermore, they were categorized into secreted and non-secreted proteins using SignalP (v4.1) 106 . Secondary structure was predicted by PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/) 107 . Since one features of SMPs is that they contain repetitive sequences for initiating deposition of proteins and minerals, repeats within SMPs were detected with RADAR (http://www.ebi.ac.uk/Tools/pfa/radar/) 108

Phylogenetic position of brachiopods: background
The Phylum Brachiopoda comprises of three major subphyla, Linguliformea, Craniiformea, and Rhynchonelliformea 7 , the former including Lingulida and some other orders. Of these, Lingulida is the only lineage that has survived until the present, and it also has a rich fossil record dating to the Crambrian Period 53 . In spite of a great number of fossil species, our knowledge of brachiopod phylogeny is still limited. Before the 1980s, zoologists thought that brachiopods were deuterostomes based upon their mode of development ( Fig. 1b-i). They were then grouped with protostomes by 18S rRNA analysis 110 . This classification was further confirmed by analyzing Hox genes in brachiopods and priapulids 111 . However, hypotheses on the evolutionary origin of brachiopods that come from paleontological [112][113][114] and embryological 115 studies have been highly debated. In addition, brachiopod phylogenetic position based on molecular phylogeny is still controversial (see Supplementary Fig. 6 for three types of topology; see Supplementary Table 5 for the list of full comparison  Table 5). As a result, the only consistency among these studies is that brachiopods are always grouped with phoronids, which confirms the previously proposed clade Brachiozoa (i.e., Brachipoda+Phoronida) 120 ( Supplementary Fig. 6). On the other hand, the relationship between Brachiozoa and Nemertea is unclear. In opposition to the idea of "Kryptrochozoa," an analysis based on 11 protein coding genes and 2 ribosomal RNA genes from 96 taxa showed that the sister group of Brachiozoa is Mollusca but not Nemertea 22 ( Supplementary Fig. 6, Type 1; Supplementary Table 5). In agreement with this, analyses of SSU and LSU from 22 taxa showed similar results, suggesting Nemertea is not close to Brachiozoa 23 .
In addition, a close relationship between Brachiopoda and Mollusca was supported by a large scale analysis using a 1,487 gene-matrix 2 and a broader sampling with 113 taxa 25 .
Accordingly, there is still an unresolved phylogenetic issue with brachiopods. It is therefore useful to have genomic data to understand the phylogeny of lophotrochozoans. Since Lingula has been recognized as the most primitive group of brachiopods due to its anatomical features and life history 61 , comparative studies of the Lingula genome with decoded genomes from three molluscs and two annelids 8-10 can provide useful information to interpret the evolution of brachiopods. Since there are no published data on genomes of Phoronida and Nemertea, the present analysis did not include these phyla for the analyses, which will be the subject of future studies on lophotrochozoan evolution.

Molecular phylogeny of Lingula
To identify robust phylogenetic markers, two strategies were applied. First, OrthoMCL (v2.0.9) 121,122 was used to cluster orthologous gene groups from 22 selected metazoan proteomes (Supplementary Table 1, asterisks), and then orthologs with one-to-one orthologous relationships were selected for further analyses using custom Perl scripts. Second, homology searches using a bidirectional best hits (BBH) approach 123 with BLASTP and custom Bash scripts identified the best orthologous pairs among many-to-many orthologous relationships. Alignments of orthologs were performed with MAFFT (v7.130b) 124 . Unaligned regions were then trimmed with Gblocks (v0.91b) 125 or TrimAl (v1.2rev59) 126 . Trimmed alignment blocks were concatenated with a Perl script catfasta2phyml.pl (https://github.com/nylander/catfasta2phyml). Finally, the maximum likelihood method with LG+Γ4 127 and GTR+Γ4 128 models was used to construct phylogenetic trees by RAxML (v8.0.5) 129 . Bayesian trees were constructed with PhyloBayes (v3.3f) 130 using LG+ Γ4 and GTR+Γ4 models with the first 500 trees as a burn-in. After a run time of ~20 days (with approximately 4,000 generations), convergence of the tree topology was post-analyzed by sampling every 10 trees.
Three different phylogenetic positions of brachiopods related to other lophotrochozoans have been proposed (Supplementary Fig. 6). Since we did not include phoronids and nemerteans in our current analysis, an alternative version of the current hypotheses on the relationship among Lingula, molluscs, and annelids is topologically simplified (Supplementary Fig. 7a). To resolve these problems, we applied extensive phylogenetic analyses using genomic scale data to identify robust orthologs among selected genomes. We selected orthologs, which have only one copy in each genome, where the evolutionary pressure is supposed to be similar and the orthologous relationship is unambiguous. We selected three different marker sets including 150 one-to-one orthologs identified by OrthoMCL orthologous groups (OG) from 15 genomes (including ecdysozoans, coral, and sponge), 515 one-to-one orthologs identified by OrthoMCL OG from 10 genomes, and 2,295 ortholog pairs selected with BBH from many-tomany orthologous relationships from 10 genomes (Supplementary Fig. 7b).
Gene ontology (GO) analysis of the term biological process shows that the selected markers belong to core metabolic processes, such as ribonucleoprotein complex biogenesis and RNA processing, suggesting that they are more likely to indicate reliable evolutionary history than other highly specific genes ( Supplementary Fig. 7c). The results based on 150 one-to-one orthologs from 15 genomes (with 46,845 amino-acid positions using sponges as an outgroup) indicate that Lingula is closely related to Mollusca rather than Annelida (Fig. 1j). Further analyses using 515 one-to-one orthologs and 2,295 orthologs found with BBH from 10 genomes (removal of sponges, corals and ecdysozoans) provided results to support this conclusion ( Supplementary Fig. 7d,e). Bayesian analysis using 150 and 515 markers tested by posterior probability also yielded the same result as that of maximum likelihood (Supplementary Fig. 7f).
With respect to currently available genome resources, our data confirm that Brachiopoda is closer to Mollusca, favoring the type-1 topology of the current hypothesis ( Supplementary Figs 6 and   7a).

Evolutionary rate of genes associated with basic metabolism
Protein-coding genes of another "living fossil," the coelacanth, have been reported to be evolving significantly more slowly than those of other tetrapods 131 . Similarly, we found that a Lingula gene-set associated with basic metabolism (Supplementary Fig. 7c) showed the slowest evolutionary rate (i.e., the amino acid substitution rate in terms of branch length of the tree) compared to other lophotrochozoans ( Fig. 1j and Supplementary Fig. 7d). This slow rate may be one of reasons why Lingula has retained its shell form with little modification for more than 520 million years.

Further analyses on selection of phylogenetic markers
To examine more carefully the issue of selecting proper phylogenetic markers, we further performed extensive analyses on the effects of using phylogenetic markers with different substitution rates when determining the phylogenetic relationship of Brachiopoda, Mollusca, and Annelida. We calculated the evolutionary rate of the given orthologs ("gene rate" hereafter) by summing the total branch length of the gene tree in selected genomes using a custom Perl script with a BioPerl module Bio::TreeIO. We then examined the distribution of 515 one-to-one orthologs from 10 selected genomes, and categorized their distribution into five sets (Supplementary Fig. 8a; solid red line denotes the slowest evolving genes while dashed red lines denote others with faster rates). We found that when a set of genes with slowest evolutionary rate is used, the phylogenetic position within the known chordate grouping is incorrect, suggesting only using the slowest evolving genes generate biases in phylogenetic analysis. This situation can be improved when the genes with an average evolutionary rate are added to the analysis ( Supplementary Fig. 8b).
In addition, to test the effect of sampling size, we performed an analysis by sampling random marker sets of 50 genes. We showed that when the number of genes is under 100, there is an incorrect grouping of chordates, indicating that sampling size causes biases. This can be improved by sampling more than 100 genes (Supplementary Fig. 8c). Interestingly, we also found that in some cases, the phylogenetic position is unstable even using a larger sampling size. This effect was examined further by looking at the gene rate distribution of selected gene sets. On top of that, we showed that this unstable condition is due to accidental sampling of the fast-evolving genes ( Supplementary Fig. 8d). To test whether fast-evolving genes contribute to the variation in interpreting phylogenetic position, we further performed bootstrap support analysis using fastevolving gene sets (with gene rate > 6). Indeed, the unstable relationship can be observed, as in Supplementary Fig. 8e. Our analyses thus suggest that it is worth carefully examining the sample size and the gene rate of selected phylogenetic markers, since these two factors affect the final outcome of the analysis. Taken together, these data also support the greater affinity of Brachiopoda for Mollusca than Annelida.

Lineage-specific domain loss
In addition to molecular phylogenetic analysis, we examined whether Lingula-Mollusca-Annelida relationship is supported by using other qualitative traits. First, annotated metazoan proteomes Common domain losses were evident among the three mollusc species and between the two annelids ( Supplementary Fig. 9a). Fourteen and twelve shared loses were detected between Lingula and Crassostrea and between Lingula and Pinctada, respectively. In contrast, only one and three common losses were detected between Lingula and Capitella and between Lingula and Helobdella, respectively ( Supplementary Fig. 9a). These results indicate that Lingula is closer to molluscs but not annelids, consistent with the molecular phylogenetic analyses. In addition, during this analysis, we also noticed that the CHRD domain which is an important part of the dorsal-ventral patterning gene, Chordin, has been lost in the pearl oyster (Pinctada) and in annelids ( Supplementary Fig. 9a). This finding, together with the fact that Chordin cannot be found in the Helobdella and Capitella genomes, but is mostly retained in other lineages dating back to cnidarians 135 , supports Chordin loss as a synapomorphic trait in annelids, as previously suggested 136 .
Furthermore, we noticed that the SOUL domain for heme-binding protein and the DAP domain for Death-associated protein are lost in annelids (See Supplementary Table 6 for a full description of 22 lineage-specific domain losses in the annelids). Functional classification on GO biological process of these 22 lost domains in annelids showed that they are mainly involved in metabolic and cellular processes (Supplementary Fig. 9b). Taken together, these suggest that annelids have specific metabolic needs and that stress responses are different from those of molluscs and brachiopods.

Microsynteny analysis
To gain further insight into the evolutionary history of lophotrochozoans 8 , we conducted a microsynteny analysis of the Lingula genome in comparing with Branchiostoma floridae (amphioxus), Lottia, and Capitella. First, ortholog groups among these bilaterians were identified using OrthoMCL with proteomes downloaded from UniProt. Next, genome annotation (i.e., general feature format (GFF) file) and transcript fasta files with corresponding headers to given GFF files were retrieved (Supplementary Table 1). The relationship between each UniProt protein and the transcript was identified with BBH method. Finally, locus information for each conserved ortholog groups among selected bilaterians was acquired with custom Perl scripts. For ortholog groups with human counterparts, human IDs were used to represent the ortholog name, whereas ortholog group IDs were used for lophotrochozoan-specific genes.
We found that for lineage-specific syntenic blocks with at least 3 genes, Lingula shares  Fig. 10b). In addition, we found a similar case in which a conserved orthologous cluster was shared by Lingula, Lottia, and Branchiostoma, but not Capitella, although there were genes inserted between CTBP1 and MAEA in Branchiostoma ( Supplementary Fig. 10c). Interestingly, we found many NTBs that were presented only in Lingula and Lottia but not in other genomes ( Supplementary Fig. 10d,e and Supplementary Table   7). Taken together, our data strongly support that Lingula's greater affinity to molluscs than to annelids, consistent with the results of molecular phylogeny and lineage specific domain loss.

Intron structure
The average gene length in the Lingula genome is 6,669 bp, while transcripts average 1,425 nucleotides ( Supplementary Fig. 2). The mean number of introns per gene is 6.6, with an average length of 787-bp. To better understand the evolution of intron size, we compared the genome size, gene size, and intron size of Lingula to those of eight decoded metazoan genomes. We found that there is a weak correlation between genome size and gene size during evolution ( Supplementary Fig. 11a, R 2 =0.5), but a strong positive correlation between gene size and intron size ( Supplementary Fig. 11b, R 2 =0.88). These suggest that during metazoan evolution, one of the main factors affecting gene size is intron size. In addition, analyses of genome size and intron size show that Lingula is more similar to Lottia, than annelids ( Supplementary Fig. 11a,b). To further examine the similarity of intron structure between Lingula and Lottia, we selected 150 one-to-one orthologs used for phylogenetic analyses and analyzed intron structure among Lingula, Lottia, and Capitella. We found that 26 genes in all three genera contain the same number of introns, while Lingula and Lottia share 32 genes with the same number of introns. In contrast, there are only 10 genes shared with the same number of introns between Lingula and Capettela ( Supplementary Fig. 11c and Supplementary Table 10). These results also support the closer relationship of Lingula to molluscs than to annelids.

The disorganized Hox cluster and loss of Lox2 and Lox4
Hox genes, homeodomain-containing transcription factors, play an important role in regulating anteroposterior body axis and appendage development. They are highly conserved among animals, usually with a fixed gene order on the chromosome and a segmented expression pattern according to its physical location in the genome. This property is so termed "colinearity" 137 .
Recent studies have shown that the Hox cluster is surprisingly conserved in bilaterians, suggesting that a single 11-gene Hox cluster is present in the last lophotrochozoan common ancestor 8 .
To study the Hox cluster in Lingula, Hox orthologs were identified by phylogeny of the Hox gene tree (Supplementary Fig. 12a). We found Lingula orthologs for Hox1, Hox2, Hox3, Hox4, Scr, Lox5, Antp, Post1, and Post2. We failed to identify them for Lox4 and Lox2 in our Lingula gene models, despite extensive BLAST searches. When we examined this gene cluster, we found that the Hox cluster is disorganized and broken into two genomic regions. The anterior and central Hox genes lie in a 446-kb long scaffold, whereas the posterior Hox genes reside in a 413-kb long scaffold, respectively ( Supplementary Fig. 12b). There are five non-Hox genes posterior to Lox5, three of them with homology to known sequences (i.e., namely EXOS6, ACTP1, and WSDU1 counted from posterior; Supplementary Fig. S12b, grey boxes).
Interestingly, Antp was rearranged to link with Hox1 in opposite direction (Supplementary Fig.   12b).
Fragmented Hox gene clusters have been reported in many lophotrochozoans, such as Helobdella, Capitella 8 , and Crassostrea 9 . It may be that lophotrochozoans experienced less selective pressure to keep the intact Hox cluster due to their unique body plan. Another finding is that lophotrochozoan Hox genes Lox4 and Lox2 are lost in Lingula ( Supplementary Fig. 12), although these two genes were reported in a previous study 111 . This discrepancy is possibly because Hox gene sequences obtained in the previous study were based on a PCR method. Since they were short and incomplete, this may have caused to an incompatible homology assignment.

Overall gene components (a) Transcription factors
By Pfam domain analysis using custom Perl scripts, we examined components of transcription factor-related domains and their abundance in the Lingula genome, comparing them with 9 selected bilaterians. For example, the Lingula genome contains 37, 16, 27, 9, and 129 genes for those with bZIP, Ets, fork head, GATA, and homeobox, respectively (Supplementary Table 11).
These numbers are comparable to those of molluscs, but different from those of annelids. For instance, the numbers of homeobox genes in three molluscs (Lottia, Crassostrea, and Pinctada) are 140, 117, and 116, respectively, and these are smaller than those of two annelids, 182 for Capitella and 242 for Helobdella (Supplementary Table 11). It is tempting to speculate that the higher number of homeobox genes in annelid lineage may be related to their segmented body plan, which is absent in the molluscs and brachiopods.

(b) Signaling pathway-related molecules
A similar Pfam domain analysis was carried out to determine components of signaling pathwayrelated domains. The Lingula genome contains 4, 5, 7, 15, and 17 genes for those with FGF, Hedgehog, Notch, TGF-beta, and Wnt, respectively (Supplementary Table 12). In general, these numbers are larger than those of molluscs and annelids, suggesting that more complicated cellsignaling-associated regulation may occur in Lingula.

Evolution of Lingula gene families
Gene families are groups of homologous genes that either originate with a speciation event (i.e., orthologs) or a duplication event (i.e., paralogs) which usually have similar functions due to their close sequence identity 138 . The rate of gene duplication within gene families is estimated to be 17-30 genes gained and lost per million years in fruit flies and mammals 134,138 . This gain and loss of gene families has been shown to play an important role in shaping lineage specific traits 134 .
To analyze gene family evolution in lophotrochozoans, we performed all-to-all BLASTP analysis followed by Markov clustering in order to identify orthologous gene groups (OG) with OrthoMCL, according to the standard protocol using a default inflation number of 1.5 121 . We then estimated gene family birth and death by computing the OG with an ultrametric tree generated by the Bayesian method using Computational Analysis of gene Family Evolution (CAFE; v3.1) 139 .
The divergence times were estimated by calibrating geological time according to fossil records 25 .

(a) The 20 most abundant domains in Lingula
The estimated gene number in Lingula (34,105) is larger than that of other lophotrochozoans: Lottia (23,800) 8 , Crassostrea (28,027) 9 , Capitella (32,389) 8 Table 13). We found that the top three were 576 genes with protein kinase domain, 553 genes with protein tyrosine kinase, and 504 genes with seven-transmembrane receptor (rhodopsin family) (Supplementary Table 13). In general, the number of these domains in Lingula is larger than in molluscs and annelids. Next five most-abundant domains in the Lingula genome are all related to Ankyrin repeats (Supplementary Table 13). All of the most abundant domains are involved in cellular processes related to signaling pathways, suggesting that biological regulation in Linugla is more complex than in molluscs.
In addition, CAFE analysis showed that the turnover rate of Lingula gene families is the highest among selected bilaterians. The Lingula genome showed 7,263 gains and 8,441 losses of gene families (Fig. 2b). To better understand evolution of Lingula gene families, we further examined its size structure. The majority of Lingula gene families are small. There are ~6,000 gene families with only one copy and ~4,000 with only 2 copies (Supplementary Fig. 13a). In addition, Lingula has no gene families larger than 50 genes, and no highly expanded gene families were found compared with other lophotrochozoans (Supplementary Fig. 13b,c).
Furthermore, we examined the age distribution of duplicated paralogous genes by estimating their non-synonymous substitution rates (Ks). Among the youngest duplicated genes (Ks < 0.1), we found that Lingula genes duplicate at a rate approximately two to four times higher compared to Lottia (~3.8x) and Capitella (~2.2x) (Fig. 2c). A large portion of these young duplicated genes is undergoing negative selection, suggesting functional constraints on those genes. We also found that genes related to extracellular matrix are experiencing positive selection ( Supplementary Fig. 13d,e), indicating the need to acquire new functions.
These results indicate that the Lingula genome has a unique evolutionary history different from other lophotrochoans. Lingula genes associated with basic metabolism show a slower evolutionary rate, while rapid acquisition and loss of entire gene families have occurred. These findings together with the fact that high gene duplication rate show that the Lingula genome has been actively evolving, contradicting the "living fossil" idea. This decoupling of the molecular and morphological evolution has been also reported in the scorpion, Mesobuthus martensii 151 .

(c) The 20 most expanded gene families in Lingula
The 20 most abundant gene families in the Lingula genome with detectable homology and functional annotation were compared to those of 21 selected metazoan genomes (Supplementary  Table 14). Including CHS8, CHST3 and CEMIP, five of the 20 most expanded families have possible functions in the shell formation, since their high expression level in mantle tissue (shown by asterisks in Supplementary Table 14). GO biological process analysis indicates that the expanded gene families are mainly associated with metabolic processes, localization, and cellular processes (Supplementary Fig. 14b).
CHST3 catalyzes the transfer of sulfate to chondroitin (N-acetylgalactosamine polymer).
Chondroitin sulfate is a major component of the glycosaminoglycans (GAGs) and plays important roles in the extracellular matrix 152 . Interestingly, it has been reported that Lingula shells are composed of large amount of GAGs with the property mimicking an elastic isotropic gel 52 . We found that expanded CHST3 was highly expressed in larvae and mantle tissue, which might be responsible for embryonic shell and adult shell formation, respectively ( Supplementary   Fig. 14c). This suggests that expansion of CHST3 may be related to the unique elastic Lingula shell. In addition, Lingula lines the walls of its burrow with mucus secreted by the mantle 55 . We found that one of expanded gene families, mucin-4 (MUC4; gel-like glycosylated protein), is highly expressed in the larval stage, mantle, and lophophore ( Supplementary Fig. 14d). This finding supports the secretory nature of the mantle. Furthermore, the high expression of MUC4 genes in the lophophore also suggests that mucus may be involved in feeding and defense against pathogens.

(d) Evolution of Lingula chitin synthase genes
Chitin is a linear, long-chain polysaccharide of N-acetylglucosamine, which is the most abundant organic polymer next to cellulose and broadly used by in metazoans and fungi 153 . In ecdysozoans, it can be found mainly in body wall cuticles of crustaceans 154 and insects 155 . Also, it is the major component in mollusc shells 156 , gastropod and chiton radulae 157 , and cephalopod beaks 158 , as well as chaetae (i.e., hair-like sensory bristles) in polychaetes 159 , chitons 160 , and brachiopods 161 . In addition, in many invertebrates, a chitin scaffold structure, peritrophic matrix, lines the midgut and acts as a mechanical barrier against pathogens, as well as facilitating digestion 162 .
Furthermore, not restricted in protostomes, chitinous structures have been reported in the epidermal cuticle of bony fish 163 . Taken together, it is evident that chitin plays crucial roles in animals for the functions of protection, support, feeding, and digestion.
Given that chitin synthase (CHS) genes are the largest expanded gene-family in the Lingula genome, we performed extensive analyses of CHS gene evolution. By combining of BLASTP, OrthoMCL, and KEGG approaches, we identified 31 CHS genes in the Lingula genome. First, we checked the domain combination of CHS genes, we found that most Lingula CHS genes carry only one CHS domain (Chitin_synth2) and others have one in combination with a Sterile alpha motif (SAM) or myosin head domain (Myosin_head) (Supplementary Table 15).
Next, we identified the region of the CHS domain using HMMER (hmmscan). After that, we retrieved the amino acid sequences using custom Perl scripts. The phylogenetic tree of CHS genes was then constructed using conserved CHS domains (358 amino-acid positions). We found that there are two groups of CHS genes, which belong to the metazoan and protostome clades, respectively (Fig. 3a). The expansion of Lingula CHS genes can be found in both clades, in which nine Lingula CHS genes belong to the lophotrochozoan clade (Fig. 3a).
It has been proposed that a myosin-head-domain (MHD) might have fused to CHS genes during lophotrochozoan evolution 164 . Interestingly, we demonstrated that MHD-containing CHS genes occur only in lophotrochozoans, which is in agreement to a previous report 164 . We also found that there is a greater expansion of MHD-containing CHS genes in molluscs than Lingula and annelids (Fig. 3a,b). In molluscs, an MHD-containing CHS gene is expressed specifically in cells that are in close contact with the larval shell 46 and that are probably related to shell formation 165 . Its high expression level during larval shell formation and in the adult mantle further suggests the correlation with mollusc shell formation 9 . In addition, notably, the SAM domain-containing CHS genes are found only in the metazoan clade, in which amphioxus CHS genes are highly expanded. The combination of SAM and CHS domains has been reported in amphioxus 166 and can be also found in corals and sponges (Fig. 3a), suggesting that this combination is likely an ancient character dating back to the metazoan ancestor.
Since Lingula's chitinous structures, such as shells 52 , chaetae 161 , and pedicles 167 are well known, we further examined the expression pattern of these CHS genes. Transcriptome analysis of Lingula CHS genes shows that they are expressed in all adult tissues and in the larval stage (Fig. 3c). The MHD-containing CHS gene is highly expressed in the larval stage and mantle, suggesting that it may also play a role in Lingula shell formation (Fig. 3c). Additionally, CHS genes are highly expressed in the gut and digestive cecum, suggesting that a chitinous peritrophic matrix may also be present in the Lingula midgut (Fig. 3c). The expansion of CHS genes in the Lingula genome and different expression profiles of these genes suggest that chitin plays significant roles in biomineralization and digestion, which should be carefully examined in the future studies.

Biomineralization in shell and bone formation: background
From bacteria to vertebrates, biomineralization is employed to make hard tissues, mostly in the form of calcified minerals with carbonate or phosphate, for protection, support, and feeding [168][169][170] .
Molluscs may be the most successful animal group that forms hard external tissues. Like most other marine invertebrates, mollusc shells are composed of calcium carbonate (i.e., CaCO 3 ) (Supplementary Table 17). The mineral parts constitute more than 90% of the shell weight, and the mass of organic matrix in the shell is usually less than 5% 171,172 .  Table 17).
Similar to mollusc shells, brachiopod shells also consist of three major layers. The outermost layer, periostracum (~4 µm), is an organic layer composed of chitin and organic matrix. The primary layer (~40 µm) is composed of rod and botryoid types of apatite and glucosaminoglycan gels (GAGs; with long unbranched polysaccharides). The secondary layer, the laminated layer (variable in thickness), is composed of apatitic laminae 52,174 . The laminated structure provides flexibility and fracture resistance, which may benefit burrowing 176 . It is worth mentioning that in Lingula there are collagen fibers at the interface of the primary and secondary layers, a feature not shared by molluscs 52,174,177 (Supplementary Table 17).
Biomineralization has been extensively studied but the molecular mechanism remains unknown. The process has been termed as "biologically induced" or "biologically controlled" depending on the degree of biological control involved. The minerals are formed by biologically induced processes if their precipitation is the result of interactions between the organism and the environment, in which cell surfaces and compartmentalized fluid cavities catalyze nucleation and growth of the minerals (i.e., mineralization is initiated by an extracellular organic matrix). On the other hand, the biologically controlled process involves direct control of nucleation, growth, morphology, and location of mineral deposition via intracellular regulation 178 . In humans, for example, cells capable for making calcified tissues, such as cartilage, bone, and dentin, form socalled matrix vesicles, that bud off from specific regions of the plasma membrane and regulate ion concentration and mineral formation intra-cellularly and intra-vesicularly 179,180 . In sea urchins, larval endoskeletons or spicules are formed intra-cellularly in membrane-delineated compartments generated by multiple skeletogenic cells 181 . Skeletogenic cells are able to transform minerals from amorphous calcium carbonate into crystalline calcite 182,183 .
Two models have been proposed for the mechanism of mollusc shell formation. The matrix-secreted model (i.e., biologically induced) suggests that the mantle epithelial cells secrete shell matrix proteins and ions into a compartment (i.e., extrapallial space) where the minerals are formed 171,184 , whereas various tissues may also contribute to this secretion process 185 . In the cellmediated model (i.e., biologically controlled), cells (e.g., granulocytic hemocytes in case of oysters) form the minerals intra-cellularly, in which crystal nucleation is initiated under cellular regulation 9,186 . Taken together, it is reasonable to hypothesize that these two models might both be involved in the biomineralization during shell formation.
Even though there is a lot of interest in mollusc shell formation, the evolutionary origin of mollusc shells is unclear. Studies of mollusc mantle transcriptomes and shell proteomes suggest that gene sets responsible for formation of calcium-carbonate-based calcite or aragonite are evolved rapidly. Mineral homology among molluscs might be the result of parallel evolution, since their "toolkit" genes of many species are so diverse [187][188][189] . Supporting this view, new shell matrix proteins may have originated from gene duplication events, in which those genes were initially responsible for general functions and were later co-opted for calcification 190 . One interesting proposition is that horizontal gene transfer from bacteria may also have contributed to the rapid neofunctionalization of biomineralization gene sets during early metazoan evolution 191,192 , although this idea is still a matter of debate.
In contrast to studies of mollusc shell formation, the origin of the Lingula shell is largely unknown. Although some Cambrian arthropods, tommottids, and various other problematica also used calcium phosphate for their skeletons 193 , one intriguing observation is that lingulid brachiopods and craniates (i.e., head vertebrates) are the only two well-characterized groups of extant animals that utilize calcium phosphate minerals 168 . Given that vertebrate bones are made up of hydroxyapatites (i.e., Ca 10 (PO 4 ) 6 (OH) 2 ), fibrillar collagens, and GAGs 194 , which are similar in composition to Lingula shell 195 (Supplementary Table 17), it is tempting to wonder whether the mechanism of biomineralization between these distant phyla shares a common origin.
However, using solid-state nuclear magnetic resonance spectroscopy and X-ray diffraction, a recent study found that Lingula shell has higher mineral crystallinity and shows no GAG-mineral interaction compared to vertebrate bone 196 . Comparison of ultrastructure by electron diffraction confirmed the higher crystallinity and also determined that carbonate content is lower, in contrast to vertebrate bone 197 . These findings cast doubt on the idea that Lingula shell and vertebrate bones involve the same gene sets. Genomic scale comparisons of biomineralization genes among Lingula, molluscs, and vertebrates may provide interesting insights into the molecular mechanism and evolutionary origin of the Lingula shell.

Properties of Lingula mantle revealed by transcriptome analysis
To characterize genes that might be involved in Lingula shell formation, seven adult tissues were collected for RNA-seq (Supplementary Table 4). Transcript expression level was calculated as FPKM using Trinity built-in scripts with RSEM 94,96 . A Venn diagram was plotted using jvenn to identify mantle-specific genes. GO enrichment analysis, such as molecular functions and biological processes of mantle-specific genes, was conducted with DAVID 149 and PANTHER 150 .
Given that mantle epithelium is the place where shell formation occurs, genes that are specifically or highly expressed in the mantle may participate in biomineralization. We found that among five adult tissues, there are 2,724 genes specifically expressed in the mantle ( Supplementary Fig. 15a). GO enrichment analysis showed that these genes are responsible for cell surface receptor signaling and cell adhesion. They encode extracellular or integral membrane proteins, such as G-protein receptors ( Supplementary Fig. 15b,c and Supplementary Table 18).
Notably, they contain domain features like EGF, sulfotransferase, neuropeptide binding, and others. (Supplementary Table 18). These data indicate that the Lingula mantle is an actively secreting organ that expresses specific sets of glycoproteins and extracellular matrix proteins.
This is similar to a previous report showing that 25% of mollusc mantle genes encode secreted proteins 187 . Our results also support the proposal that the appearance of calcified tissues at the Precambrian-Cambrian transition might have originated from reorganization of preexisting secretory machinery 198 . In addition, we found genes related to respiratory gaseous exchange enriched in the mantle, which might relate to the mantle canal, a unique circulation organ in brachiopods 199 ( Supplementary Fig. 15b).
Besides searching for genes that are specifically expressed in the mantle, we also analyzed genes that are more highly expressed in the mantle than in other tissues. We found that collagen and zonadhesin are the two mostly highly expressed genes in the mantle (Supplementary   Table 19). In addition, many calcium ion-binding proteins are highly expressed in the mantle, such as calmodulin, calponin, EGF domain-containing protein, and uromodulin (Supplementary Interestingly, calponin is highly expressed in the pearl oyster mantle 35 and the pearl sac 201 , suggesting that it plays a role in calcification. Furthermore, we noticed that one mucin gene is highly and specifically expressed in the mantle (Supplementary Table 19). Mucin genes have been found in coral skeleton 202,203 and in mussel shell 49 , suggesting that they play a conserved and ancient role in hard tissue formation in metazoans.

(a) Spearman's and Pearson's correlation coefficients
Given the close evolutionary history between Lingula and molluscs, we next examined whether Lingula tissues also share molecular similarity in transcriptomes with those of molluscs. RNAseq raw reads of selected adult tissues from the Pacific oyster Crassostrea gigas, which are comparable to those of Lingula, were downloaded from OysterDB (http://oysterdb.cn/) 9 and reassembled with Trinity 94 . Orthologous genes were identified using a BBH approach 123  When comparing Spearman's ρ and Pearson's r, we found similar trends of the correlation pattern, indicating that it is appropriate to use either of these coefficients for our analyses ( Supplementary Fig. 16).

(b) Transcriptome similarities between Lingula and Crassostrea
When compared intra-specifically, we found that the Lingula mantle transcriptome is most similar to those of lophophore and pedicle, while the Crassostrea mantle transcriptome most resembles those of labial palp and gill ( Supplementary Fig. 17a,b). Interspecific comparisons showed that Lingula mantle is related to Crassostrea mantle ( Fig. 4a  PC vs Amu). Interestingly, our analysis revealed that Lingula lophophore shares high similarity with Crassostrea gill (Fig. 4a and Supplementary Fig. 17c,d; LP vs Gil). This is likely due to the lophophore's role in collection of food and gas exchange 207 .
To further explore the functional similarity of mantles, we categorized each orthologous gene pair by calculating their percent difference (PD), in which two values ( and ) are compared at log scale in the following manner: √ By applying GO enrichment analyses to different PD subsets, we found that the expression profiles of genes involved in ribosomal machinery are most similar, while those of genes related to chromosome and cell cycle regulation are diverse ( Supplementary Fig. 18).
Genes related to membrane trafficking are expressed in a highly similar pattern between Lingula and Crassostrea mantles, suggesting that the functional similarity mainly comes from genes involved in secretory machinery ( Supplementary Fig. 18a,c).

Comparative genomics of genes associated with biomineralization
We next examined the known biomineralization-associated genes in the Lingula genome. Using recent published resources on bone evolution in the elephant shark, Callorhinchus milii 208

(a) Genes associated with vertebrate bone formation
A recent study of the elephant shark genome reveals that innovation of acidic secretory calciumbinding phosphoprotein (SCPP) gene family holds the key to vertebrate bone formation 208 Fig. 19c). Phylogenetic analysis of Lingula SPARCrelated genes demonstrated that Lingula has only one SPARC gene, and the other one is an ortholog of SPARC-related modular calcium-binding protein (SMOC1/2) (Supplementary Fig.   19d). This finding suggests that Lingula does not have SCPPs that arose from SPARCL1.
Next, we examined 175 vertebrate bone formation genes in selected metazoan genomes.
We found that many genes involved in vertebrate bone formation are derived from genome duplication events in the vertebrate lineage. For most of these genes, Lingula shares similar number of homologs to other marine invertebrates; there is no unusual similarity between Lingula and humans (Fig. 4b Tables 21 and 22). Consistent with the SPARC analysis, we failed to find the key bone formation genes SCPPs in the Lingula genome. Taken together, our data suggest that Lingula and bony vertebrates independently evolved their own mechanisms for hard tissue formation, as did sea urchins 214 .

(b) Genes associated with mollusc shell formation
On the other hand, a comparative study of 90 mollusc shell formation-associated genes showed that Lingula shares most of the common "toolkits" with sea snail and oysters, but there are also many oyster-specific genes that cannot be found in other bilaterians (Fig. 4b, Mollusc shell formation-related proteins). Further analysis of these genes revealed that many so-called shell formation genes are also shared with humans ( Supplementary Fig. 20a). GO functional classification showed that these 30 core-shared genes are mainly related to cellular and metabolic process, localization, and biological regulation ( Supplementary Fig. 20b). In addition, transcriptome analysis in Lingula adult tissues demonstrates that expression of these shared genes is not limited to the mantle and many of them are not expressed. These results suggest that many shell formation genes have been co-opted for mollusc shell formation independently, while they carry out different functions in other bilaterian lineages (Supplementary Table 23). Notably, there are eight genes shared between Lingula and molluscs; five of them exhibited high expression in larvae and the mantle. These include genes for calcium-dependent protein kinase 35  . These genes may also be involved in Lingula shell formation.

(c) Genes associated with spider silk formation
Lastly, we could not detect any spidroin-like protein genes in the genomes that we compared.
When searching for silk proteins in the Lingula genome, no homolog with sequence similarity was found (Supplementary Table 20, Spider silk proteins). This suggests that silk formation is unlike that of Lingula shell, although there are proteins with alanine-rich regions shared in shell matrix 219 and silk proteins 220 . Given that alanine-rich proteins are the main constituents of the Lingula shell 177,221 , it is possible that Lingula evolved poly(alanine) silk-like proteins independently to develop shell extensibility, which may play an important role in their burrowing lifestyle.

Conserved molecular mechanisms in metazoan biomineralization
Although the Lingula mantle is similar to that of molluscs, we cannot exclude the possibility that the similarities might represent nothing more than the sharing of common secretory cell types.
The question remains whether conserved molecules and mechanisms exist for shell formation. To resolve this issue, we focused on one of the ancient metazoan signaling pathways, bone morphogenetic proteins (BMP; or Decapentaplegic, Dpp). BMPs are signaling ligands belonging to the transforming growth factor-β (TGF-β) superfamily. The BMP pathway has been conserved for dorsal-ventral patterning in bilaterians 222 and for symmetry breaking in cnidarians [223][224][225] .
These results suggest that it has an ancient role in regulating the body plan. Intriguingly, BMP signals are also required for bone formation in vertebrates 28 , shell formation in molluscs 226 and skeleton formation in corals 227 .
To visualize activation sites of BMP signals, we applied immunostaining of nuclear phosphorylated Smad1/5/9 (pSmad), an activated mediator for the signaling 229 . The expression profile of Bmp2/4 is coincident with the nuclear pSmad signals, suggesting that activation of BMP signaling requires Bmp2/4 expression ( Supplementary Fig. 21c). The commercial pSmad antibody is produced from a synthetic phosphopeptide corresponding to residues surrounding Ser463/465 of human SMAD5, and cross-reacts with human SMAD1 and SMAD9. To validate the specificity of pSmad staining in Lingula, we compared C-terminal sequences of Smad proteins in selected metazoans. The alignment shows that C-terminus of Lingula Smad1/5/9 is identical to human SMAD1 and SMAD9 (Supplementary Fig. 21d). This observation suggests that the commercial pSmad antibody may also be useful in lophotrochozoans ( Supplementary   Fig. 21d). Furthermore, we observed that nuclear pSmad signals start to appear at the early blastula stage (Supplementary Fig. 21c). The signals are strongest at the early gastrula stage showing an asymmetrical pattern. This indicates that BMP signaling may play a role in axial patterning in Lingula (Supplementary Fig. 21e). Thus, the temporal correspondence between staining signals and Bmp2/4 expression, as well as their nuclear localization and asymmetrical pattern in the embryo, argue strongly against the possibility of non-specific binding. More detailed studies will be required to address the function of BMP signaling during Lingula embryogenesis.
In Lingula, embryonic shells are formed upon mantle lobes starting at the 1-pair-cirri larval stage 54 . Interestingly, at different larval stages, we found that BMP signals are activated at the anterior margin of the mantle lobe, suggesting that the signal may be involved in embryonic shell formation ( Fig. 5 and Supplementary Fig. 21f, arrows). In gastropods, Bmp2/4 is expressed in posterodorsal ectoderm along the mantle edge 27,31,232 . On the other hand, in bivalves, Bmp2/4 is expressed in the shell field and shell field invagination 233 . Given that BMP signals are activated at the margin of Lingula mantle lobes, these findings suggest that BMP signaling may play a conserved role in biomineral formation in the metazoan common ancestor. Further analyses of how BMP signaling regulates embryonic shell formation in Lingula will be informative to understand the evolution of biomineralization.

The Lingula shell matrix proteome (a) Identification and characterization of Lingula shell matrix proteins (SMPs)
Proteomic approaches have recently been introduced into the field of mollusc biomineralization, where they provide powerful tools to identify novel shell matrix proteins (SMPs) 47,[234][235][236] . The mantle epithelium has multiple functions. In addition to shell formation, it is also responsible for mucus secretion, light sensing, and circulation. To identify Lingula SMPs that are possibly directly involved in shell formation, we conducted proteomic analysis of the matrix proteins from the Lingula shell ( Supplementary Fig. 22a). We found a total of 231 putative SMPs by retrieving gene models with high-quality peptide hit(s). To avoid contamination from other tissues or cells, we identified SMPs by applying the following strategy.
First, we classified putative SMPs by their solubility and found that most of them were in the acid insoluble fraction (Supplementary Fig. 22b; 146 acid insoluble proteins, 46 acid soluble proteins). Next, we found that most of putative SMPs had only one unique peptide hit ( Supplementary Fig. 22c) and many of them lack signal peptides ( Supplementary Fig. 22d).
Using a GO statistical overrepresentation test, we showed that selection of putative SMPs with unique peptide hits (>1) and with signal peptides significantly enriched proteins that are related to extracellular matrix ( Supplementary Fig. 22e). In addition, it has been reported that tandem duplication often occurs in genes related to biomineralization 194 . To select the final set of SMPs, we then applied the combination of genes with unique peptide hits (>1), with signal peptides, and those showing tandem duplication of the scaffold (Supplementary Fig. 22f). Finally, we identified 65 SMPs in the Lingula shell proteome, 51 of which are present in all metazoans, and 14 are Lingula-specific, without counterparts in any other organism.
Characteristics of these SMPs such as domain composition, pI, and percentages of amino acid are given in Supplementary Tables 25 and 26 for those with homologies and for novel ones, respectively. Unexpectedly, we could not find secreted acidic proteins (pI<4.5) 237 Fig. 23a).
Through an examination of amino acid composition, one of the main characteristics of Lingula shells compared with other articulate brachiopods or molluscs is that their SMPs contain a large amount of glycine and alanine 52,177,221 . To support previous observations, we provided the first molecular evidence to show that glycine-rich SMPs are collagens (Supplementary Table 25,   G%>20). In addition, we also found that many novel SMPs are alanine-rich and in low molecular weight (~10-20 kDa, amino-acid length ~100-200) (Supplementary Fig. 23b and Supplementary  Fig. 23c and Supplementary Table 27). The domain composition suggests that the shell matrix is derived from extracellular matrix 238 .
We next examined the expression profile of these SMPs. We found that 26 Table 28). Many of these genes function as extracellular enzymes and ion binding sites in humans, suggesting that they are probably co-opted in Lingula for shell formation. Their expression in both the mantle and the shell implies that they may be directly involved in biomineralization. We also showed that five SMPs are weakly expressed or have no expression in the mantle, suggesting that they have been deposited into the shell matrix in the earlier event of the production (Supplementary Fig. 24c). All 14 Lingula-specific SMPs are highly or specifically expressed in the mantle, indicating specific roles in shell formation ( Supplementary Fig. 24d).
Taken together, nearly one third of SMPs are expressed ubiquitously, while half of them are expressed specifically in the mantle (Supplementary Fig. 24e).

(b) Comparative genomics of Lingula SMPs
When Lingula SMPs are compared to those of other bilaterians, we found that most of the Lingula shell proteins are highly specific, and are not present in either molluscs or vertebrates (Fig. 4b, Lingula shell matrix proteins). To gain insights into the evolutionary origins of mineral formation genes, we excluded SMPs that are present only in the Lingula lineage (i.e., novel) or shared by all other animals. After filtering, we identified 29 SMPs, which were further analyzed by comparing them with those found in 12 selected metazoan genomes. By comparative genomics, we found that the composition of Lingula SMPs shared homology mostly with those of amphioxus and molluscs ( Supplementary Fig. 25). These data are consistent with those of the whole genome comparison with bone formation genes (Fig. 4b, Vertebrate bone formation).
Regarding the phylogenetic debates on the relationship of brachiopods, molluscs, and annelids ( Supplementary Fig. 6), we searched for SMPs that are only shared by Lingula and annelids; however, we found none. Instead, we discovered 11 SMPs that were lost in the annelid lineage, but that have been retained in the other lineages. Taken together, analyses of the subset of SMPs indicate a close relationship between Lingula and molluscs, suggesting that some of the SMPs already existed before the common ancestor of Lingula and molluscs.

(c) Novel Lingula SMPs
Recent proteomic studies of molluscan shells have shown that both highly conserved and lineagespecific genes are expressed in the shell matrix 47,234 , suggesting that each mollusc lineage may use different genes for shell formation, according to environmental conditions and genetic context. One important finding of our shell proteome study is that Lingula carries a lot of lineagespecific SMPs. Careful examination revealed that some of these SMP genes have tandem duplicated architecture in the genome. One example is an alanine-rich gene family that has three copies arranged in tandem on the same scaffold ( Supplementary Fig. 26a). These novel secreted alanine-rich proteins contain conserved 4-5 poly(alanine) blocks and GYGY motifs ( Supplementary Fig. 26b).
Poly-alanine proteins are usually found in silk proteins with poly(glycine-alanine) or poly(alanine) motifs 239 . It is proposed that the repetitive poly(alanine) motifs in the silk protein are able to fold into β-sheet, forming highly oriented alanine-rich crystals 220 . Intriguingly, similar alanine-rich SMPs have also been found in oysters. But in comparison with the 4-8 poly(alanine) blocks in silk proteins, oyster SMP, MSI60, has 9-13 poly(alanine) blocks, which may contribute to pack crystals more densely 219 . Another oyster SMP, Shelk2, has 7-8 poly(alanine) blocks. This protein is expressed in the fresh shell framework structure prior to shell regeneration 240 . Lingula alanine-rich SMPs have 4-6 poly(alanine) blocks, which are more similar to those of silk proteins than of oysters. To gain more insight into the function of these novel proteins, we predicted their 3D structure with I-TASSER 109 . Interestingly, we found that the top-scoring predicted structure is similar to that of a recently designed artificial monomeric three-helix bundle (Supplementary Fig.   26c; C-score=-2.72), which has high thermodynamic stability 241 . It is likely that properties of this novel helix protein contribute to the unique features of the Lingula shell. Further studies on this protein will be needed to elucidate its role in shell formation.

Evolution of Lingula fibrillar collagen (a) Phylogeny of fibrillar collagens
Bone formation in vertebrates relies on depositing apatite crystals on fibrillar collagens 242 . Under scanning electron microscopy, Lingula shells show collagenous fibrils associated with GAGs 52 .
Collagen fibers are not detected in the shell proteome of the Pacific oyster, suggesting that mollusc shells are not composed of fibrillar collagen 9 . Given that biominerals with fibrillar collagens are one of the characteristics shared by Lingula and vertebrates, using phylogenetic analysis on the evolution of fibrillar collagen, we tested whether they shares a common origin of biomineralization with vertebrates.
Vertebrate fibrillar collagens can be grouped into three major groups (Clade A, B, and C) carrying COLFI domains 194 (Fig. 6a). Our analysis shows that the fibrillar collagens used for shell formation in Lingula do not have COLFI domains. Instead, they comprise a new group of collagens with EGF-like domains, which do not belong to the vertebrate type of fibrillar collagens (Fig. 6a,b). These new types of collagen are expressed in the shells and mantle, suggesting their direct involvement in shell formation ( Fig. 6a and Supplementary Table 28).
This finding is consistent with the previous observation that the ultrastructure of Lingula shell collagen fibers is different from that of vertebrates 174,177 . Notably, some fibrillar collagen genes likely arose by tandem duplication (Fig. 6c).

(b) Shuffling of EGF and Collagen domains
It has been shown that domain shuffling contributes to the evolution of lineage-specific characteristics in vertebrates 243 , fruit flies 244 and corals 202 . Given that Lingula collagens carry a new domain combination, which is not found in vertebrate-type fibrillar collagens, we analyzed the domain shuffling based on EGF and collagen domains. Supplementary Tables 29 and 30 summarize the most abundant domains combined with EGF and collagen domains, respectively, in Lingula, humans, and molluscs (sea snail, Pacific oyster, and pearl oyster). We found that Lingula contains 17 genes encoding proteins with combination of EGF and collagen domains (Supplementary Table 29), the number of which is the highest among bilaterians ( Supplementary   Fig. 27a). Four of these 17 EGF domain-containing collagens are found in the shell matrix proteome (Fig. 6b). Further analyses of domain combinations showed that Lingula carried higher number of EGF domain-containing proteins and these with domains combined with EGF domain in others bilaterians ( Supplementary Fig. 27b). On the other hand, the number of collagen domain-containing proteins in Lingula is higher than in molluscs but similar to annelids ( Supplementary Fig. 27c). In addition, we found that 11 of the 20 most abundant domains combined with EGF domains are commonly shared by other bilaterians, whereas a collagen domain is specially linked to EGF domains in Lingula ( Supplementary Fig. 27d)

Evolution of bilaterian biomineralization (a) Biomineralization mechanisms in Lingula
We have demonstrated that Lingula used its own gene sets to originate their calcium phosphate chemistry that is different from the set used by vertebrates. In addition, we have shown that there are lineage-specific SMPs in Lingula and molluscs, respectively. A schematic summary of genes involved in Lingula shell formation identified by this study is given in Fig. 7. References supporting the illustration are provided in Supplementary Table 31. We proposed that the metazoan ancestor used a core of ancient signaling proteins to initiate the biomineralization process. We speculated that this involves canonical BMP signaling, in which BMP ligands bind to its receptor, from which a signal is transduced by the regulatory and co-mediator, pSmad1/5/9 and Smad4, respectively. They then act as transcription factors, interacting with other proteins to activate the expression of downstream biomineralization genes (Fig. 7, proteins in green). The other conserved transcription factor is engrailed, which is involved in both bone and shell formation ( Fig. 7; Supplementary Table 31, Shell and bone formation).
In addition, many calcium binding proteins (e.g., calcineurin, calponin, and calmodulin) and extracellular matrix proteins (e.g., cadherin, collagen, and fibronectin) have been reported to participate in bone and shell formation (Fig. 7, Supplementary Table 31). This implies that metazoan biomineralization likely originated from a calcium-regulated extracellular matrix system. Furthermore, we also discovered that Hox4, tyrosinase, chitin synthase, perlucin, chitinase, peroxidasin, mucin, and VWA protein are common shell formation-associated components shared by Lingula and molluscs (Fig. 7, proteins in orange; Supplementary Table 31, Shell formation), suggesting that this fundamental gene set has been used by their last common ancestor, estimated to be approximately 600 MYA 25 (Fig. 2b).
Additionally, Lingula shared with vertebrate genes associated with bone formation including carbohydrate sulfotransferase and fibrillin (Fig. 7, proteins in blue; Supplementary  202,203,246 . It implies that these extracellular ion-binding proteins in the biomineral matrix may either be the common features of metazoans that have been lost in vertebrate bones and mollusc shells, or that they arose independently in Lingula and corals. Notably, Lingula-specific proteins such as EGF domain-containing fibrillar collagens and alanine-rich proteins may represent the original genes for calcium phosphate-based biomineralization 57 . The duplication of carbohydrate sulfotransferase, chitin synthase, fibronectin, and mucin genes may also contribute to unique features of Lingula shells (Fig. 7, proteins in dashed outlines). Taken together, our genomic, transcriptomic, and proteomic analyses of Lingula biomineralization show similar patterns to those in molluscs 188 and corals 202 , where co-option, domain shuffling, and novel genes are the fundamental mechanisms for metazoan biomineralization.
In conclusion, we proposed possible mechanisms for Lingula shell formation (Fig. 7).

(b) Evolutionary scenarios of biomineralization
Although fossils of conodont elements might be the first mineralized skeletons of vertebrates dating back to the late Cambrian (~515 MYA) 247 , their affinity to the vertebrate teeth is uncertain 248 . Thus, the first vertebrate mineralized bones (i.e., endoskeletons) appeared in the late Ordovician (~450 MYA) 208 much later than lingulid shells (~520 MYA, early Cambrian) 65 .
Together with the distant phylogenetic relationship of vertebrates and Lingula, it is perhaps not surprising that bones and shells shared different genetic origins. In fact, recent discoveries from Cambrian fossils have changed our ideas about evolution of early molluscs and animal biomineralization. For example, a non-mineralized cephalopod fossil, Nectocaris, found in Burgess Shale (~508 MYA, middle Cambrian) suggests that a mineralized shell is a derived character of cephalopods 249 . On the other hand, phylogenomic studies of mollusc phylogeny show that shells may have multiple origins 5,6 , which is in agreement with the proteomic studies of mollusc shells [187][188][189] . Extant molluscs can be divided into two major groups, Conchifera (shellbearing; Gastropoda, Bivalvia, Scaphopoda, Cephalopoda, and Monoplacophora) and Aculifera (worm-like; Neomeniomorpha, Chaetodermomorpha, and Polyplacophora) 5 . Although conchiferans make shells and aculiferans have only sclerites, both of them use calcium carbonate.
While brachiopods have adopted different modes of biomineralization, only the Linguliformea makes shells with calcium phosphate 52 ( Supplementary Fig. 28a).
In the light of the close phylogenetic relationship between Lingula and molluscs, we hypothesized evolutionary scenarios for the primitive mode of biomineralization in their common ancestors. By comparing chemical and molecular features, three possible primitive modes are presented ( Supplementary Fig. 28b-d). First, we propose that calcium phosphate might be the primitive mode of biomineralization, since lingulid brachiopod fossils are abundant in the early Cambrian ( Supplementary Fig. 28b). However, this implies a huge number of secondary losses in other lineages, which makes this hypothesis less attractive. On the other hand, calcium carbonate might be primitive, because it is the mode that has been used by most extant brachiopods and molluscs. Relatively few losses are required to fulfill this scenario ( Supplementary Fig. 28c).
Nevertheless, calcium phosphate and carbonate biominerals appeared almost at the same time during the Cambrian explosion 171 . Although the mollusc-like fossil, Kimberella, was found before the Cambrian 250 , there is no clear evidence which mode of the biominerals appeared first.
Perhaps the ancestor of lophotrochozoans was non-minerlaized. Supporting evidence comes from another mollusc-like fossil, Odontogriphus, in the middle Cambrian. Considered as a stem-group lophotrochozoan, it was shell-less and possessed putative radulae 251 . Thus, we argue that calcification might be a derived feature in molluscs and brachiopods, in which chitin in the shell may be a synapomorphic character shared by their ancestors. Chitinous scaffold may provide the organic framework for interactions between extracellular matrix and mineral ions ( Supplementary Fig. 28d). This idea is supported by data from the embryonic shell of molluscs, where a chitin scaffold is crucial for shell formation 165 . More interestingly, chitin and chitin synthase genes were recently found in vertebrates, expressed in epithelial cells of fishes and amphibians 252 . These suggest an ancient evolutionary origin of epidermal chitin in bilaterian ancestors. The ancestral composition of animal biominerals remains to be resolved. Further comparative genomic and functional studies of lophotrochozoans, such as brachiopods, phoronids, and molluscs will be needed to resolve this question.