Transcriptome analysis of the bivalve Placuna placenta mantle reveals potential biomineralization-related genes

The shells of window pane oyster Placuna placenta are very thin and exhibit excellent optical transparency and mechanical robustness. However, little is known about the biomineralization-related proteins of the shells of P. placenta. In this work, we report the comprehensive transcriptome of the mantle tissue of P. placenta for the first time. The unigenes of the mantle tissue of P. placenta were annotated by using the public databases such as nr, GO, KOG, KEGG, and Pfam. 24,343 unigenes were annotated according to Pfam database, accounting for 21.48% of the total unigenes. We find that half of the annotated unigenes of the mantle tissue of P. placenta are consistent to the annotated unigenes from pacific oyster Crassostrea gigas according to nr database. The unigene sequence analysis from the mantle tissue of P. placenta indicates that 465,392 potential single nucleotide polymorphisms (SNPs) and 62,103 potential indel markers were identified from 60,371 unigenes. 178 unigenes of the mantle tissue of P. placenta are found to be homologous to those reported proteins related to the biomineralization process of molluscan shells, while 18 of them are highly expressed unigenes in the mantle tissue. It is proposed that four unigenes with the highest expression levels in the mantle tissue are very often related to the biomineralization process, while another three unigenes are potentially related to the biomineralization process according to the Quantitative Real-Time Polymerase Chain Reaction (qRT-PCR) analysis. In summary, the transcriptome analysis of the mantle tissue of P. Placenta shows the potential biomineralization-related proteins and this work may shed light for the shell formation mechanism of bivalves.

www.nature.com/scientificreports/ Total RNA extraction, cDNA library preparation and Illumina sequencing. Total RNA was extracted from the mantle tissue of P. placenta by using the Quick RNA Isolation Kit (Biotech Corporation, Beijing, China). The concentrations of RNA were quantified with a Qubit 2.0 Fluorometer (Invitrogen, Life Technologies, CA, USA), and the RNA integrity was evaluated using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). The weight of the mantle tissue for RNA extraction was no less than 300 mg for each sample and the experiments were repeated for three times. The cloned DNA (cDNA) library construction and sequencing analysis were performed at Beijing BioMarker Technologies (Beijing, China). The cDNA library was constructed as follows. In the case of eukaryotes, polyadenylation (poly (A)) tail messenger RNA (mRNA) was purified with oligonucleotides (dT) magnetic beads from total RNA, and then the mRNA-enriched RNA was randomly segmented into small fragments in a divalent cation fragmentation buffer (Illumina, Hayward, CA). These short fragments were used as templates to synthesize the first strand cDNA using random hexamer primers. The second strand cDNA was generated using RNaseH and DNA polymerase I. The double stranded cDNA was purified by beads (Beckman, CA, USA) and washed with elution buffer for end repairing and tailing A. Afterwards, these short fragments were ligated to sequencing adapters according to Illumina's protocol (SanDiego, CA, USA) and were then separated by agarose gel electrophoresis.
In order to obtain the cDNA library, suitable fragments (300-500 bp) were selected as templates for PCR amplification. The effective concentration of the cDNA library (> 4nM) was accurately quantified by the quantitative real-time PCR (qRT-PCR) method. Finally, the cDNA libraries were sequenced on an Illumina HiSeq TM 4000.
De novo assembly and sequence annotation. Before de novo assembly, raw data were saved in fastq format and filtered for downstream analysis using in-house pear script, which mainly contains three filter steps: (1) discard sequencing adapter; (2) remove reads with more than 10% of unknown bases; (3) filter low-quality reads which contain more than 50% of low-quality bases (Q ≤ 20). After the above filtering processes, de novo assembly was carried out using Trinity with min_kmer_cov set to 3 by default and all other parameters set to their default values 40 .
The coding regions of the assembled unigenes also known as the coding DNA sequence (CDS), were predicted by two steps. First, the unigene sequences from the mantle tissues of P. placenta were successively translated and compared to the NCBI non-redundant protein sequences (nr database) and a manually annotated and reviewed protein sequences (Swiss-Prot) database. Then, the Basic Local Alignment Search Tool (BLAST) analysis was used to annotate the functions of the protein sequences based on nr and Swiss-Prot databases with a cut-off E-value of 10 -5 . The protein sequences are actually amino acid sequences of proteins. These two proteins database have correct sequence to predict the coding regions of the unigenes. Second, The CDSs were determined using ESTScan with default settings if the unigenes could not be aligned to any entry of the above mentioned two databases 41 . ESTScan software was used to determine the open reading frame and obtain the nucleic acid and amino acid sequences from the unannotated unigenes. Finally, all annotated unigenes were aligned to the public databases, including the euKaryotic Ortholog Groups (KOG, http:// www. ncbi. nlm. nih. gov/ COG/), Kyoto Encyclopedia of Genes and Genomes (KEGG, http:// www. genome. jp/ kegg/), Gene Ontology (GO, http:// www. geneo ntolo gy. org/) and Protein family (Pfam, http:// pfam. sanger. ac. uk/) databases.

SNP, Indel, SSR markers identification and primer design. Single nucleotide polymorphisms (SNP)
refers to genetic markers formed by mutations of the single nucleotide on the genome. There are a large number of SNPs with abundant polymorphisms. Generally, SNP is defined as single nucleotide mutations with a mutation frequency more than 1%. Indel refers to the insertion and deletion of small fragments in the sample genome relative to the control genome. The small fragments may contain one or more bases. We identified SNPs and indels by using samtools, picard-tools and GATK2 software 42 .
Simple sequence repeat (SSR), also known as short tandem repeats and microsatellite markers, which contains the number of repeat motifs with less than six nucleotides (di-, tri-, tetra-, penta-, and hexa-nucleotide). The repeat motifs exist often in the genome of eukaryotes. Unigenes sequences were used to search for potential SSR markers using MISA software (version 1.0, http:// pgrc. ipk-gater sleben. de/ misa/ misa. html) with the default parameters. Afterwards, Primer 3 software (version 2.3.5, http:// fokker. wi. mit. edu/ prime r3/ default parameters) was applied for SSR primer design.

Identification of unigenes involved in biomineralization and comparison of transcriptomes.
Biomineralization-related genes have been reported in the mantle tissues of molluscans such as prismalin-14 43 , pif 44 , amorphous calcium carbonate binding protein 45 , and asporin 46 . Identification of the potential biomineralization-related unigenes were carried out by keyword searching according to the reported expressed unigenes in the nr database which were reported to be related to biomineralization. In general, three kinds of unigenes were considered as potential biomineralization-related unigenes, well-known unigenes that have been reported to play important roles for the shell formation, the reported unigenes which are highly homologous with the reported shell formation related unigenes, the proteins those were found in mollusk shells and had high expression in the mantle tissues of molluscan animals. All of the above-mentioned unigenes may probably play key roles for the shell formation. If several unigenes were assigned with the same reference gene, the unigene with the lowest E-value and highest score was assumed to be the homolog of the reference gene.
To compare the similarities and differences between the mineralization proteins of various molluscan animals, the transcriptome annotations of P. placenta were compared with the mantle transcriptome annotations from two other molluscans, P. yessoensis 36  Quantitative real-time polymerase chain reaction (qRT-PCR) analysis. In order to determine the expression levels of the potential biomineralization-related genes in different tissues of P. placenta, the unigenes from seven candidate tissues were characterized by using qRT-PCR. Seven sets of tissue samples were dissected from wild type of adult P. placenta, with shell sizes ranging from 50 to 70 mm. All the soft tissue including adduction muscle, gonad, gill, mantle, hepatopancreas, mouthparts, and intestine were dissected from fifteen fresh animals. For qRT-PCR measurement, the weight of each kind of tissue is no less than 300 mg. Each qRT-PCR analysis was replicated for three times. These tissues were kept in RNA Keeper-ICE tissue transfer buffer for RNA extraction. RNA Keeper-ICE tissue transfer buffer penetrates into the tissues during the defreezing process and the RNase of the tissues were inactivated. Total RNA from the different tissues were extracted with TRIzol Reagent at a ratio of 1 mL per 100 mg tissue. The concentration and purity of RNA were quantified using NanoDrop® ND-2000 (Thermo scientific). The RNA integrity was evaluated by Bioanalyzer 2200 (Agilent). The primer sequences used for qRT-PCR are listed in

Result
Sequence analysis and de novo assembly. A total of 29,996,311 raw reads were obtained after Illumina sequencing analysis. Considering that the raw reads contain a small amount of sequencing adapters and low-quality reads, the resulting clean reads were obtained after quality filtration and then subjected to de novo assembly. Firstly, the quality of the extracted RNA is in A grade (RIN > 7, for our data). Secondly, the Q20 percentage and Q30 percentage are 98.54% and 95.94%, respectively, and the error rate is 0.01%, which indicate that RNA sequencing results are qualified and reliable. The percentage of GC content for the clean reads is 44.45% (Table 1). These results show that the obtained unigenes are suitable for annotation. Filtered reads are assembled into 138,384 transcripts, with a total length of 106,972,212 nucleotides. These transcripts are subsequently assembled into 113,325 unigenes and the mean length of transcripts is 697 nucleotides, ranging from 201 to 39,605 nucleotides. N50 length is defined as the unigenes length L, while 50% of all bases in the sequences are in contigs of length less than L. The mean length of unigenes is 773 nucleotides, with N50 of 1,437. In summary, the total length of all unigenes is 79,007,878 nucleotides or 73.9% of the length of all transcripts ( Table 2). The length-frequency distribution for unigenes and transcripts shows that the numbers of transcripts and unigenes decline with the increase of length, while the ratios of unigenes to transcripts with the same length ranges remain almost the same. The number of the unigenes with length less than 500 bp is 76,237, accounting for 66.39% of all unigenes, while less than one-third of long reads (greater than 1000 bp) are assembled into unigenes, indicating that short reads are more likely to be assembled into unigenes (Fig. 1). were further investigated by using the five public databases (Pfam, nr, GO, KOG, and KEGG) with a cutoff E-value of 10 −5 (Fig. 2). The unigenes annotated in the Pfam database contain about 21.48% of all unigenes, followed by 18.99% annotated unigenes in the nr database, 12.16% in the KOG database, 9.5% in the KEGG database, and 7.83% in the GO database. In the alignment with NCBI nr database, 21,516 unigenes were successfully annotated (Table 3). Venn diagram of the numbers of unigenes expressed from public databases nr, Pfam, GO, KOG, and KEGG were shown in Fig. 2, which show clearly that many of the unigenes were annotated from more than one database. Among these successfully annotated unigenes, 46.67% of the annotated unigenes have strong homology with the aligned proteins (E-values less than 1.0E -45 ) in the nr database (Fig. 3a). Nearly half of the unigenes (45.43%) return 60-80% similarities to annotated unigenes, and 21.45% return 80-100% similarity to annotated unigenes (Fig. 3b). 55.51% of the annotated unigenes of mantle of P. placenta matches best with that extracted from pacific oyster C. gigas according to nr database (Fig. 3c). A possible reason for the above homologous genes of the two  www.nature.com/scientificreports/ different animals might be that the C. gigas is one of the few molluscans whose genomes have been completely sequenced and the details of numerous genes of C. gigas have been released. The strong similarity revealed in our study further implies that the genome studies of C. gigas can be reliable references for the study of P. placenta even though the evolutionary relationship between C. gigas and P. placenta is not very close.
To classify the gene functions, the mantle unigenes of P. placenta were investigated by using the GO database. This database is an international standardized gene functional classification system that offers a dynamically updated controlled vocabulary and strict definitions that completely describe the properties of genes and their products in any organism. There are 8,873 unigenes assigned to three main GO categories (Table 3), including biological process category, cellular component category, and molecular function category, and 44 terms in total as shown in Fig. 4. For GO categorization, the "metabolic process" and "cellular process" terms are dominant in the biological process category, the "cell" and "cell part" terms are dominant in the cellular component category, and the "binding" and "catalytic activity" terms were dominant in the molecular function category. The terms "extracellular region" and "extracellular matrix" associated to cellular component category are generally supposed to be related to SMPs. Most SMPs were up-regulated to a greater content at the juvenile stage than those at the adult stage of pearl oyster P. fucata 50 . A total of 31 unigenes related to SMPs are recognized in the terms "extracellular region" from the mantle tissue of P. placenta, such as collagen, chitin binding protein, calcium ion binding protein (Excel S1).
The mantle unigenes of P. placenta were investigated by using the KOG database for better functional classification. 13,782 unigenes of mantle tissues of P. placenta were assigned to 26 categories according to KOG database (Table 3). About 41% of the unigenes are assigned to the three main categories, (R) category "general function prediction only" (2739, 17.62%), (T) category "signal transduction mechanisms" (2445, 15.73%), (O) category "posttranslational modification, protein turnover, chaperones" (1314; 8.5%). In this study, the numbers of unigenes of mantle tissue P. placenta assigned to the P (inorganic ion transport and metabolism) category and T category were 606 (3.9%) and 2445 (15.73%), respectively (Fig. 5, Excel S2). Many of the annotated unigenes assigned to T and P categories in Excel S2 are homologous with ferritin, calcium ion binding protein, calmodulin and other biomineralization-related proteins as reported in the literature. The P category (inorganic ion transport and metabolism) contains some Ca 2+ -related proteins such as Ca 2+ binding protein, Ca 2+ transporting ATPase, Voltage-gated Ca 2+ channels (Excel S2). In contrast, there are less than 10% unigenes unevenly distributed in the remaining categories such as Y category "nuclear structure" and N category "cell motility", accounting for 0.33% and 0.3%, respectively (Fig. 5).
To better understand biological pathways involved in the P. placenta mantle unigenes, all unigenes of mantle tissues were assigned according to the reference canonical pathways in the KEGG database. According to the Table 3. Statistics of the numbers and percentages of unigenes annotated in different databases and percentage of each database.  www.nature.com/scientificreports/ classification results, 10,766 unigenes were assigned to five specific pathways (Table 3). 1,716 unigenes were assigned to metabolism, followed by genetic information processing (1,245 unigenes), organismal systems (741 unigenes), environmental information processing (518 unigenes), cellular processes (472 unigenes). According to the KEGG classification, the pathway of translation owns the largest number of unigenes (634 unigenes), representing for 50.92% of the total annotated unigenes in the category of genetic information processes, followed by pathway of global and overview maps (336 unigenes) in metabolism category (Fig. 6). It was reported that the calcium signaling pathway, Wnt (wingless/int1) signaling and tyrosine metabolism pathways probably play significant roles in the biomineralization process 37,50 . However, no unigene was found in the calcium signaling pathway in the KEGG database from the unigenes of the mantle tissue of P. placenta. 23 unigenes were found to be involved in Wnt signaling and tyrosine metabolism pathways. Among them, the calcyclin binding-protein, frizzled-related protein and dopamine beta-monooxygenase proteins are supposed to be related to calcium signal transduction and the shell formation process (Excel S3). The chemokine signaling and leukocyte transendothelial migration pathways were significantly enriched after the shell damage treatment, indicating that granulocytic hemocytes containing matrix proteins, a type of leukocyte, may transport calcium ions or contribute to the biomineralization process of calcite crystals [51][52][53] . There are four unigenes found in these two signaling pathways according to KEGG database and only C-C motif chemokine may involve the biomineralization process (Excel S2).  (Fig. 7). 62,103 potential indels in total were also identified among 21,630 unigenes with frequency of about 2.87 indels per unigene (Excel S6). The numbers of unigenes decrease gradually with the increase of indel length from 1 to 10 bp (Fig. S1). The indel lengths are mainly 1, 2 and 3 bp. The numbers of deletion are larger than that of insertion when the indel length is more than 10 bp (Fig. S1).

Identification of genes involved in biomineralization process.
To better understand the proteins related to the shell formation process, the annotated unigenes of mantle tissues of P. placenta were compared with the sequences of proteins known to be associated with biomineralization process in molluscan shells using nr database. The annotated unigene with lowest E-value was selected as the representative unigene while several annotated unigenes were assigned to the same reference unigene. 178 homologous unigenes of 51 shell matrix proteins, such as calmodulin, perlucin, ferritin and carbonic anhydrase were found to be related to biomineralization process in the transcriptome of mantle tissue of P. placenta (Excel S10). This is the first time to report the potential biomineralization-related unigenes in the mantle tissue of P. placenta, as far as we know. A lot of researchers have focused on the identification of genes related to the shell formation of molluscan animals, and an increasing number of genes have been identified [55][56][57][58] . 259 proteins were identified from oyster C. gigas shells by proteomic analysis 56 . In comparison with the proteomic data of shell matrix of C. gigas, we identified a set of 158 unigenes that are probably related to shell formation, including house-keeping protein elongation factor 1a, and extracellular matrix protein collagen (Excel S9). Many of the shell-formation related proteins are enzymes such as glutathione peroxidase, hemicentin and tyrosinase, that may be involved in matrix construction or modification 57,58 . In this study, only one enzyme tyrosinase (three unigenes) was found to be related to the shell formation process in the transcriptome of P. placenta (Excel S9). Twenty-one proteins with 66 homologous unigenes were identified to be related to calcitic shell formation of P. placenta (Table 4). Furthermore, eighteen of the above 66 unigenes were identified to be highly expressed in mantle tissue of P. placenta. A few highly Comparison of the mantle transcriptomes of different molluscans. The shells of scallops P. yessoensis, C. farreri and P. placenta are composed of foliated calcite minerals. The mantle transcriptomes of scallop P. yessoensis, C. farreri and P. placenta were compared to find the similarities and differences of the biomineralization related proteins of these molluscan organisms 36,37 . We find 117 biomineralization-related unigenes in the mantle of P. placenta, much less than those in P. yessoensis (162 unigenes), but much more than those in C. farreri (42 unigenes) (Fig. 9). There are six biomineralization-related unigenes expressed in the mantles of the three species, including sarcoplasmic calcium-binding protein, calcineurin a, calmodulin-like protein, perlucin, alkaline phosphatase, tyrosinase-like protein tyr-3. In comparison to C. farreri, P. placenta and P. yessoensis have more similar homologous biomineralization-related unigenes, 31 unigenes in total, about 27% of the biomineralization related unigenes of P. placenta. For example, collagen, chitin synthase 1/2, carbonic anhydrase-like, heat  www.nature.com/scientificreports/ shock protein 70 and calmodulin were found in both P. yessoensis and P. placenta, potentially indicating their functional similarities for their biomineralization processes. There are probably more mineralization proteins to be discovered in the unigenes of the mantle tissue of P. placenta (Fig. 9, Excel S10).
Quantitative Real-Time PCR (qRT-PCR) analysis. Ten selected potential biomineralization-related unigenes in the different tissues such as adduction muscle, gonad, gill, mantle, hepatopancreas, mouthparts, and intestine of P. placenta were examined by qRT-PCR (Fig. 10). In general, four of ten unigenes have much higher expression in mantle tissue than those in the other tissues, which include c76266_g1 (gigasin-2), c73086_g1 (tyrosinase-like), c66761_g1 (pif-like) and c59513_g1 (teneurin-2) unigenes ( Fig. 10a-d). Pif is an important macromolecule for in vivo shell formation of nacre 44 . In addition, it was found that pif can induce the formation of aragonite and vaterite crystals in the in vitro system 77 . Tyrosinases are abundant in shells and their high expression in mantle of the pacific oyster C. gigas indicates that their functions are probably related to shell formation 56 . Gigasin-2 and teneurin-2 were identified for the first time from the shell of C. gigas via shell matrix proteome characterization 78,79 . According to the Pfam database analysis, teneurin-2 in P. placenta is predicted to be epidermal growth factor (EGF) domain and gigasin-2 has zona pellucida (ZP) domain (Excel S11). EGF domains are mostly found in the SMPs as tandem repeats and only present in the prismatic (calcitic) layers but not in the nacreous layer of Pinctada 80 . The EGF domain is a calcium-binding motif composed of 45 amino acids arranged in two small β-sheets with six conserved cysteine residues 81 . Both EGF-like and ZP domains have been reported in the shells of Lottia gigantea 82 and C. gigas 78 . ZP domains are present in a range of extracellular filament or matrix proteins from a wide variety of eukaryotic organisms, and are characterized by eight conserved cysteine residues, which are involved in protein polymerization processes 83 . We propose that the above mentioned four unigenes are primarily biomineralization-related unigenes. However, the expression levels of the other six unigenes in the mantle tissue are not as high as their expression levels in the other tissues, indicating that they probably have other functions except for the biomineralization process ( Fig. 10e-j). We would discuss more about the functions of these ten unigenes in the discussion part.
Pfam database analysis. Pfam database search is important for understanding the possible biomineralization-related functions of the shell matrix proteins of molluscan animals. In molluscan animals, shell matrix proteins are very often repetitive, highly conserved, low complex domains. The functions of these protein domains involved in biomineralization have been studied by many groups 13,84 . For example, the nacre protein perlucin contains a C-type lectin domain and has a broad carbohydrate-binding feature, which was supposed to facilitate calcium-dependent glycoprotein-protein interactions within the skeletal matrix 23 . In molluscan animals, the pif-like proteins contain von willebrand factor type A (VWA), chitin-binding and laminin G domains, which can bind chitin framework and accelerate CaCO 3 precipitation inside the chitin membrane, and then regulate their vertical alignment 85,86 . In bivalves, most EF-hand proteins from the mantle tissue of bivalves are Ca 2+ sensors or signal modulators, which may induce conformational change by binding with Ca 2+ , such as calmodulin, troponin C and myosin light chains 87 .
In this study, the sequences of potential biomineralization genes were obtained from the transcriptomics analysis of P. placenta mantle. Based on these amino acid sequences, the information of their domain was obtained by using Pfam. Finally, we speculate on the function of potential biomineralization-related proteins of P. placenta based on the characteristics of the domain found in biomineralization proteins of molluscan. Identification of the possible biomineralization-related functions of the unigenes expressed in the mantle tissue of P. placenta were carried out by keyword searching according to the domain which were reported to be related to biomineralization in the Pfam database. Many potential biomineralization-related unigenes are proposed to be involved in the shell formation from the transcriptomes of the mantle tissue of P. placenta according to the Pfam database analysis. (Excel S11).
Calmodulin, calponin and mucin proteins are supposed to be associated with molluscan shell formation. Among those proteins, calponin was highly expressed in the mantle of P. placenta, with FPKM values > 551, but  ) ( Excel S11), but it expressed a higher quantity in mouth parts than that in the other tissues (Fig. 10f), suggesting that calmodulin may play other roles in P. placenta. The insoluble matrix shell protein 6 shows a higher expression in the tissues of hepatopancreas and intestine than that of the other tissues in P. placenta (Fig. 10j). Cadherin and collagen proteins contain enriched amount of von willebrand factor type A and epidermal growth factor domains, indicating that they were derived from the extracellular matrix 88 . However, most of them show a low expression level in the mantle tissue of P. placenta. Only c83310_g1 (collagen alpha-1) shows a relatively high expression level (FPKM values in between 11.18 and 32.32) (Excel S11). Perlucin extracted from abalone nacre contains a functional C-type lectin domain which can increase the precipitation rate of calcium carbonate from a saturated solution, indicating that it may promote the nucleation and/or the growth of calcium Table 4. Identification of genes involved in the calcitic shell formation of P. placenta a . a The first column is the accession number of the protein, which could be indexed in the Uniprot. Expression level is shown as FPKM. RecName is the protein name recommended by the UniProt database. AltName represents synonyms found in the literature or in other databases.  15,23 . Among the proteins containing perlucin domains, only the c67461_g2 unigene (perlucin) shows a high expression level (FPKM values: 65.55-100.36) in the mantle tissue of P. placenta (Excel S11). However, c67461_g2 unigene shows higher expression in tissues of hepatopancreas than that of the other tissues by qRT-PCR (Fig. 10e). Carbonic_anhydrase domain was reported to be involved in the formation of both nacreous and prismatic layers 20,24,89 . The expression values of all the carbonic anhydrase unigenes are relatively high, with FPKM values > 27, such as c71108_g1, c85950_g1, c81423_g3 and c84941_g5 unigenes (Excel S11). At the mean time, the expression of c84941_g5 was higher in gill than that in the other tissues (Fig. 10g). On the other hand, some extracellular enzymes or inorganic ion-binding proteins such as chitinase, hemicentin and peroxidasin with relatively high expression levels in the mantle tissue of P. placenta are probably involved in the shell formation (Excel S11). Chitinase in shell matrix may reconstruct the chitinaceous scaffold and promote the interaction between chitin and chitin binding protein 82 . In comparison to the other ion-binding proteins, the c71155_g1 unigene (chitinase 1) has a relatively high expression level (FPKM values in between 96.85 and 125.15) (Excel S11), but we didn't know yet how specific this gene is in each tissue of P. placenta.

Discussion
Shell formation is a very complicated process that involves a series of proteins and genes, while living organisms produce biominerals with superior mechanical properties under biological control 1 . The main objective of this study is to identify unigenes involved in biomineralization. Our study indicates that a high-coverage expression profile can be produced by using short-read Illumina sequencing technology and some effective sequence assembly tools such as Trinity and ESTScan softwares. A total of 113,325 unigenes with an average length of 697 bp was generated from the m antle tissue of P. placenta by using Illumina HiSeqTM 4000 sequencing technology, while 66.39% of the above unigenes (76,237) have lengths less than 500 bp (Fig. 1). The lengths of the unigene sequences from the mantle tissue of P. placenta are larger than those generated in the reported unigene sequences of P. penguin 34 . These unigene sequences are similar to the reports for the yesso scallop P. yessoensis (93,204 unigenes; 733 bp) 36 and the freshwater pearl mussel C. plicata (98,501 unigenes; 689 bp) 33 . However, the number and mean length of the unigenes from the mantle tissue of P. placenta are larger than that from the zhikong scallop C. farreri (77,975 unigenes; 538 bp) 37 and the pearl oyster P. maxima (108,704 unigenes; 407 bp) 90 . This difference may be attributed to the use of different sequencing platforms. A large quantity of genomic data is available for many bivalve species, but only 18.99% of the mantle tissue of P. placenta were annotated in nr database. This means that more than half of the unigenes of the mantle tissue of P. placenta have no known homologous unigenes. The low rate of annotated unigenes of the mantle tissue of P. placenta could be a result of limitations in the genomic information available for P. placenta, which is the case in many of other bivalve species 33,36 . The SNPs of the mantle tissue of P. placenta were obtained from the sequencing errors as well as the true SNPs. In our data, the obtained SNP density was 0.0059 SNPs/bp (0.59%) in the mantle tissue of P. placenta (Excel S5, Table 2), which was significantly larger than the sequencing error (0.01%) ( Table 1). This result confirmed the reliability of the obtained SNPs data. The SNPs are potentially useful for genetic linkage mapping and for the analysis of quantitative traits of the P. placenta. The transcriptome we present here provides the most comprehensive polymorphism for the P. placenta to date, as far as we know. The SNP density of the eastern oyster C. virginica is 0.042 SNPs/bp 54 , which was more polymorphic than that of the mantle of P. placenta (0.0059 SNPs/ www.nature.com/scientificreports/ bp) (Excel S5, Table 2). 465,392 potential SNPs were constructed from 60,371 unigenes, with frequency of about 7.56 SNPs per unigene in the mantle tissue of P. placenta, which was consistent with the reported result of the mantle tissue of pearl oyster P. martensii 35 . The indel density of the mantle tissue of P. placenta was 2.87 indels per unigene (Excel S6), which was much lower than that indel density of the mantle tissue of P. martensii 35 . As is well known, the Illumina sequencing is considered to be robust against homopolymer errors and therefore it may be suited well for identification of indels 91 . The variations in unigene expression between different tissues have been shown to be correlated with shell formation in molluscans such as C. gigas 22 , P. penguin 34 , and T. pyramis 32 . The exact biomineralization functions of proteins such as perlucin 15 and pif 85 have been investigated by in vitro and in vivo mineralization studies. According to the nr database, 178 potential biomineralization-related unigenes were identified in the mantle transcriptome of P. placenta in this work. Among these unigenes, ten selected potential biomineralization-related unigenes (FPKM values > 15) were examined in the different tissues of P. placenta by qRT-PCR. In the present study, four of ten unigenes (gigasin-2, tyrosinase-like, pif-like and teneurin-2) have much higher expression in the mantle tissue than those in the other tissues, indicating that they are very often related with the biomineralization process of P. placenta shell (Fig. 10a-d). Three gigasin-2 isoforms were identified in water soluble matrix of C. gigas shell, which are proposed to be involved in bone remodeling processes and could be responsible for the biocompatibility between bone and nacre grafts 79 . Meanwhile, gigasin-2 was highly expressed in the mantle tissue of C. gigas 56 . However, homologous proteins have not been identified in other species since gigasin-2 was reported in 2012 56 . It is known that tyrosinase family are potentially involved in melanin biosynthetic pathway in various organisms. Moreover, it was reported that tyrosinases from molluscans are secreted from the mantle and transported to the prismatic layer of the shell, while they contribute to melanin biosynthesis and shell pigmentation 56,92,93 . In this study, there is no pigmentation in the mantle tissue and the transparent shell of P. placenta. Therefore, the high expression of tyrosinase in the mantle tissue indicates that their functions are not only related to melanin biosynthesis, but also related to the shell formation. After injection of pif dsRNA, both of calcite laths of the C. cigas shell and nacreous layer of the P. fucata shell grew to disordered structure in vivo, indicating that pif protein might be essential for the normal growth of the prismatic and nacreous layer 44,77 . The teneurin-2 was first identified from diverse shell matrix proteome and had signal peptides in C. gigas, and it was proposed that it was secreted from the mantle into the shell 78 . According to the above discussion about the expressions of the four unigenes and their functions for biomineralization process in the other molluscan species in the literatures, we propose that gigasin-2, tyrosinase-like, pif-like and teneurin-2 may play important roles for the biomineralization process. It would be important to extract these four proteins and investigate their functions for biomineralization via in-vitro crystallization process of CaCO 3 in the future. We consider to study the full-length cDNA sequences, gene expression and recombinant proteins of these four unigenes to understand their functions for in vivo and in vitro crystallization of CaCO 3 .
Six of the remaining unigenes don't exhibit high expression in the mantle tissue, in comparison to the other tissues of P. placenta. Thus, it is hard to tell whether they participate in biomineralization or not based on the qRT-PCR analysis. These findings are somewhat similar from those of previous studies. For example, the researchers identified six types of perlucin and discovered their different expression levels in different tissues of T. pyramis. Some of the perlucin proteins were expressed at the highest levels in the digestive gland, while the others were expressed at high levels in the mantle or the gonad of T. pyramis 32 . The perlucin was isolated from the nacreous layer of the marine snail Haliotis laevigata and it could promote the nucleation of CaCO 3 crystals on the calcite surface in the in vitro experiments 94 . In addition, calmodulin-like protein can induce the nucleation of aragonite through binding with the 16-kDa protein and regulates the growth of calcite in the prismatic layer of pearl oyster P. fucata 95 . This expression pattern and the in vitro crystallization experiments suggest that perlucin family may play important roles in both of the biomineralization process and digestive process 32,94 . Similar to the perlucin, two types of calmodulin were expressed at the lowest level in the mantle than the other tissues in T. pyramis 32 . Calmodulin-like protein was expressed with the highest level in the mantle tissue of P. fucata species and has a potentially high affinity for calcium 96 . The carbonic anhydrase family were expressed in the mantle and associated with the shell formation in the european abalone Haliotis tuberculata 97 . Nacrein containing carbonic anhydrase domain was expressed in both the nacreous layer and the prismatic layer of P. fucata 24 . Meanwhile, it was also highly expressed in the mantle of T. pyramis and P. penguin 32,34 . However, the researchers analyzed the expression levels of the five studied carbonic anhydrase isoforms in different tissues and found that four of them were more highly expressed in the hemocytes than in the gills or the mantle in C. gigas 22 . In P. placenta, one of the perlucin family was identified by qRT-PCR and showed a highest expression in the digestive gland than that in the other tissues (Fig. 10e). The expression level of c81494_g1 unnigene (calmodulin-like) was the highest in the mouthpart, moderate in the gills, very low in the mantle tissue (Fig. 10f). The expression of carbonic anhydrase-like was the highest level in the gill tissue, medium in the mantle in P. placenta (Fig. 10g). Based on the qRT-PCR and in vitro crystallization results in the reported literature, we conclude that the above mentioned three unigenes, perlucin, calmodulin-like and carbonic anhydrate-like unigenes are potentially related to the biomineralization process of P. placenta.
As a member of the calmodulin subfamily of EF-hand Ca 2+ -binding proteins, Caltractin was first identified in C. gigas 56,98 . Another unigene calreticulin is also a calcium-binding protein, it was primarily involved in the unfolded protein response to cellular stress (temperature, salinity, air exposure and heavy metals) in the endoplasmic reticulum 14,56,99 . Both of the two calcium-binding proteins, calreticulin-like and caltractin-like exhibit relatively low expression in the mantle tissue according to the qRT-PCR results (Fig. 10h,i). However, we consider that they may probably have some kind of function for biomineralization process since they are calcium-binding proteins 100

Conclusions
In conclusion, the transcript dataset of the mantle tissue of P. placenta was investigated in details by using Illumina HiSeqTM 4000 platform and public unigene databases. The identified and annotated unigenes provide valuable genomic resources for the understanding of the biomineralization mechanism. More than half of the annotated unigenes of the mantle tissue of P. placenta are consistent with those proteins from the pacific oyster C. gigas according to nr database. The transcripts of mantle tissue of P. placenta were identified with SNP, SSR and indel markers. These SNP markers, SSR markers and primers may be used in the construction of a genetic linkage map and gene-based association studies. 66 homologous unigenes of 21 shell matrix proteins in the transcriptome of mantle tissue of P. placenta were found to be related to the calcitic shell formation, while eighteen of the above unigenes are highly expressed with FPKM larger than 15 in the mantle tissue. Furthermore, qRT-PCR analysis for ten of highly expressed homologous unigenes (FPKM > 50) related to biomineralization from six different tissues of P. placenta indicate that seven of them are potentially related to the biomineralization process of the calcitic shells of P. placenta. Especially, the qRT-PCR analysis shows that four of ten examined unigenes including teneurin-2, gigasin-2, pif-like, tyrosinase-like unigenes have the highest expression levels in the mantle tissue than the levels in the other tissues, indicating their primary functions for biomineralization process. This study can contribute to the understanding of the molecular mechanisms and the functional components of the proteins that involve the biomineralization process of the calcite foliated plates of P. placenta. The transcriptomic data generated in this study provide a basis for further studies of P. placenta genome. Moreover, the comparison of potential biomineralization genes also reveals the similarities and differences between shell formation matrix of different molluscan animals.

Data availability
All relevant data reported here are included in the main section of the manuscript or in the supplementary materials. All described materials are available upon request.