Background & Summary

Insects are the most diverse and largest class of animals on Earth, occupying in nearly all terrestrial ecological niches. Owing to this great diversity and the long-time coexistence, an amazing variety of symbiotic microorganisms have adapted specifically to insects as hosts, and participate in many relationships with the hosts14. In particular, the gut of most insects harbors a rich and complex microbial community with considerable metabolic activity5,6, which range from enhancing host energy metabolism to shaping immune system711. For example, various polysaccharide degrading bacteria were identified from herbivorous insect gut, which produce enzymes degrading otherwise host-indigestible plant component (e.g. cellulose, xylan)1216. The native gut microbiota is also more and more recognized to play as an “extended immune system” for the host against harmful microbes17. Abundant lactic acid bacteria maintain in biofilms within honeybees (Apis mellifera) and work in a synergistic matter to inhibit pathogen proliferation in the gut by producing a mixture of antimicrobials18.

Although Lepidoptera, including butterflies and moths, is one of the largest insect orders and a primary group of phytophagous agricultural pests, little is known about the microbes associated with them19,20. Indeed, by using high-throughput sequencing techniques, recently several studies have reported abundant and diverse bacteria in lepidopteran guts8,2124, but the functional significance of their gut microbiomes still remains undetermined. As a lepidopteran model organism and domesticated insect25, the silkworm Bombyx mori (Lepidoptera: Bombycidae) is important not only for basic research but also for providing raw materials to the textile and biotechnology industry16,2628. Based on this model insect, the metagenomic analysis could form the basis for further research of lepidopteran microbiome.

Here, using next-generation sequencing, we present shotgun gut metagenomes from two most common silkworm strains, namely Dazao (P50) and Qiufeng × Baiyu (QB). The inbred P50 silkworms are extensively used worldwide, as the standard strain for B. mori research; while the hybrid QB silkworms are widely used in local commercial silk production, which have a higher growth rate (Fig. 1a) and a larger cocoon size than P50 (Fig. 1b). Sample information was detailed in Table 1. As a herbivorous insect, the gut of silkworm is full filled with plant tissues, making it necessary to separate bacterial cells from the gut content to avoid plant DNA contamination. Thus, a filtration and density gradient centrifuge procedure was applied to enrich gut bacteria from the silkworm13. After bacterial DNA extraction, the metagenome was sequenced and analysed as the flowchart shown in Fig. 1c.

Figure 1: Silkworm (Bombyx mori) strains used for this study and overview of the experimental design.
figure 1

(a) The hybrid QB silkworm has a higher growth rate than P50. (b) Cocoon size and shape of P50 (yellow) and QB (white). (c) Workflow used to process silkworm gut samples to generate metagenomes. (d) Length distributions of filtered metagenome sequencing reads. (e) The functional gene rarefaction curve for each strain. (f) The taxonomical diversity rarefaction curve for each strain.

Table 1 Sample information in this study.

Shotgun sequencing produced 6.826 and 10.369 Giga base pairs (Gbp) of unassembled sequence data from P50 (MS1P50) and QB (MS1QB) samples (Table 2). In total, 44,047,886 and 67,718,490 sequences passed the quality control in the MS1P50 and MS1QB dataset, respectively. Read length distribution after filtering revealed most of sequences between 201–600 bp (Fig. 1d), and rarefaction curves tended towards saturation (Fig. 1e,f). The metagenomes were assembled separately into 91,037 and 44,201 scaffolds with 53.91 and 54.49% GC content in P50 and QB, respectively (Table 2). After ORF prediction, 148,685 ORFs from MS1P50 and 75,232 ORFs from MS1QB were identified. Table 3 summarizes functional gene annotation against various databases. By using both BlastP and MG-RAST protocols, the metabolism was found to be the major part of silkworm gut microbiome function (Fig. 2). From the Nr database output, 2,307,446 and 5,036,416 reads of P50 and QB were aligned to this category respectively, indicating that nutrient digestion and synthesis were most important aspects and microbial fermented products such as lactic acid, butyrate and vitamins2931, could also be supplied to the host. Table 4 reveals that most reads ( >99%) identified from silkworm gut metagenome belong to the domain Bacteria. Taxonomic diversity was analysed not only with shotgun metagenomic data by different tools (BlastP against Nr database and MG-RAST against GenBank), but also by direct 16 S rRNA sequencing32, which showed the same tendency (Table 5). Enterococcus, Acinetobacter, Bacillus and Enterobacter are dominant species in both strains (Fig. 3) and have previously been found in silkworms33.

Table 2 Metagenome sequencing statistics reported in this study.
Table 3 Annotation summary.
Figure 2: Composition of metabolism category in COG.
figure 2

BlastP (denoted with circle symbols) and MG-RAST (denoted with triangle symbols) methods are used respectively to query against COG database. Best hits of two strains are shown in the bar plot in the upper right panel.

Table 4 Domain coverage of shotgun reads.
Table 5 Taxonomic composition (top 15) revealed by 16S rRNA sequencing and shotgun metagenomic sequencing.
Figure 3: Species abundance and phylogenetic relationship of gut bacteria between P50 and QB.
figure 3

The phylogenetic tree is shown at the genus level, colored by the phylum. Bacterial abundance is indicated in outer ring with shape plot (P50, blue circle; QB, red circle). The size of circle represents sequence log10 reads per genus.

Altogether, the community structure and functional genes described here could be used for further exploring the potential relationship between the Lepidoptera host and commensal bacteria, thereby paving the way for developing novel strategies to promote silk production and enhancing studies on insect symbiosis.

Methods

Insect rearing and sample collection

Eggs of P50 and QB were provided by the Silkworm Quality Inspection and Quarantine Station, Department of Agriculture, Zhejiang Province, China. Silkworms were hatched in an incubator at 28 °C, 100% RH, and maintained in sealed plastic boxes (50 × 25 × 10 cm) with light-dark regime (16:8) at 25 °C and 70% RH. The 5th-instar larvae feeding with fresh mulberry leaves were used in this study. After 5 days feeding ad libitum, a hundred of P50 and QB silkworms were collected.

Gut dissection was performed as described previously34. Freeze-killed silkworms were washed with ddH2O for three times after surface-sterilization in 70% ethanol for 30 s. The larvae were dissected on ice in a clean bench. Considering that a large amount of bacterial DNA is needed for shotgun metagenomic sequencing and a risk for sample loss during purifying bacteria, dissected guts were pooled for DNA extraction and subsequent sequencing. Briefly, for each silkworm strain, 100 guts were homogenized with a hand-held homogenizer (PRO scientific, Monroe, USA). In order to avoid the plant and host tissue debris contamination, several steps of filtering were applied followed by a protocol specific for gut bacteria enrichment35. 100 mL 10 mM MgSO4 was then added to homogenized samples before being passed through 20 μm and 11 μm filters (Millipore, Bedford, USA). Each sample was centrifuged at 4000 rpm for 15 min. The pellet was resuspended with 200 μL ddH2O for further separation. 40 and 80% Percoll (GE Healthcare, Uppsala, Sweden) containing 10 mM MgSO4, 0.01% bovine serum albumen, 0.01% ficoll (Sangon, Shanghai, China), 0.05% polyethyleneglycol 6000 (Sangon, Shanghai, China), and 0.086% sucrose were used for the density gradient centrifugation. Each 7-mL centrifuge tube (Hitachi, Tokyo, Japan) was filled with 2.5 mL 80% Percoll on bottom layer, and 3.5 mL 40% Percoll on the top layer. Next, 1 mL of the sample was placed gently on the top of 40% gradient layer. The prepared tubes were centrifuged at 12,000 rpm, 4 °C for 10 min and bacterial cells were collected at the interface between the 40 and 80% Percoll solutions. Collected cells were washed by 1 mL of 10 mM MgSO4 for three times (4000 rpm, 5 min). The pelleted bacteria were used for DNA extraction.

DNA extraction and shotgun metagenomic sequencing

To remove the host genetic contamination in samples, DNase I (Epicentre, Madison, USA) was added in and incubated at 37 °C for 30 min. After DNA digestion, 1 volume of 0.5 M EDTA (Sangon, Shanghai, China) was added and heated at 72 °C for 2 min to inactivate the DNase enzyme. Bacterial DNA was extracted from the washed cells by using the MasterPure™ Complete DNA and RNA Purification Kit (Epicentre, Madison, USA). The extracted DNA was quantified for the Illumina high-throughput sequencing (Illumina, San Diego, CA). 3 μg genomic DNA per sample was used for the sequencing library preparation. DNA samples were first fragmented into 300 bp fragments by the Covaris M220 shearing system (Covaris, Woburn, USA). Illumina TruSeq™ DNA Sample Prep Kit (Illumina, San Diego, CA) was then employed for the generation of 150 bp pair-end (PE) libraries following manufacturer’s recommendations. Illumina cBot Truseq PE Cluster Kit v3-cBot-HS (Illumina, San Diego, CA) was used for PCR reaction to enrich DNA fragments with ligated adapters on both ends. NGS sequencing was performed by Illumina Hiseq 2500 sequencing platform (Illumina, San Diego, CA), resulting two fastq files for each run.

Metagenome sequence processing

Sequences with an average read length of 150 bp were introduced into SeqPrep (https://github.com/jstjohn/SeqPrep) to remove the adapter at 3′ ends of raw reads. Raw sequences containing 3 or more unknown nucleotides (‘N’) were trimmed by Sickle (https://github.com/najoshi/sickle) command “sickle pe” to obtain clean reads longer than 20 bp, and to ensure that filtered reads possessed quality threshold greater than 20. Clean reads were then assembled by SOAPdenovo36 at 39–47 k-mers. Scaffolds < 500 bp were excluded. To construct a non-redundant gene set, CD-HIT37 was employed to cluster assembled reads at 90% coverage and 95% identity, using the longest read as the representative sequence. All high quality reads were aligned (95% identity) against non-redundant database using SOAPaligner (http://soap.genomics.org.cn/) to obtain the gene abundance in each sample.

Metagenome annotation

After extraction of the non-redundant gene set, annotation was performed against Nr database by BlastP (v2.2.28+ ) at an e-value cutoff of 10−10, and the dataset was also processed by the web-based metagenomics RAST server (MG-RAST)3840. Various databases (Table 3) were used for annotation41. KEGG42 and COG43 were employed too for the alignment of functional genes. To get the information about carbohydrate active enzymes and antibiotic resistant genes, sequences were compared in the CAZy44 and ARDB45 databases, respectively. Proteins transported through the secretory pathway, which contain signal peptides, were identified by SignalP programme46. The ratio of annotated genes were determined by read number of the hits to the non-redundant database. From the annotation results, contamination sequences of the host silkworm were removed. Rarefaction curve was generated by R package vegan, based on the output file of Nr annotation.

16S rRNA sequencing and taxonomic analysis

For amplicon sequencing library preparation, a 50 μL PCR reaction system containing 10 μL 5 × FastPfu reaction buffer, 2.5 U FastPfu Polymerase (Transgene, Beijing, China), 250 μM dNTPs, 200 nM of primers, 1 μL of DNA sample and DNA-free water was performed twice to link the sequencing adapter to the PCR products. The resulting fragments were pooled together equally and quantified by Quantifluor dsDNA system (Promega, Madison, USA). PE sequencing was performed on an Illumina MiSeq instrument (Illumina, San Diego, CA).

Raw PE reads were merged by FLASH software (v1.2.7), and trimmed with Trimmomatic (v0.36) (Q > 20, N bases < 1%). Clean reads were run through UCHIME (v7.1) to remove all chimeric sequences. UCLUST implemented in QIIME (v1.8.0)47 was used for OTU clustering at threshold 97%, then the representative sequences were selected by the pick_rep_set.py script in QIIME. Taxonomic classification was performed using RDP Classifier (v2.12) with a confidence cutoff of 0.8. Phylogenetic tree was used to exhibit the gut microbiota composition of silkworm48. The longest read of each bacteria genus was picked out to generate the tree (Maxium-likelihood tree, Tamura-Nei model and bootstrap 1000)49, and visualized using iTOL50. Unclassified OTUs were further identified with BLASTN51.

Code availability

No custom code was used to generate or process these data. Software versions employed are as follows:

BlastP (v2.2.28 + )

FLASH (v1.2.7)

Trimmomatic (v0.36)

UCHIME (v7.1)

QIIME (v1.8.0)

RDP Classifier (v2.12)

iTOL (v3)

Data Records

Raw data of shotgun metagenomic sequencing (fastq file) are available from the NCBI’s Sequence Read Archive (Data Citation 1 and Data Citation 2). Raw data of 16 S rRNA sequencing (fastq file) are available from the NCBI’s Sequence Read Archive (Data Citation 3 and Data Citation 4). All shotgun metagenomic sequencing data can be also found at the MG-RAST server (for the strain P50: http://www.mg-rast.org/mgmain.html?mgpage=overview&metagenome=mgm4767611.3; strain QB: http://www.mg-rast.org/mgmain.html?mgpage=overview&metagenome=mgm4754041.3). The annotation outputs generated in this work include the non-redundant gene set.fa file (Data Citation 5) and the Annotation_results.xlsx file produced through BlastP, MG-RAST, SignalP, and comparisons to the CAZy and ARDB databases (Data Citation 5). The outputs of taxonomic analyses include: (i) an OTU table BIOM file with taxonomic assignment generated from 16 S rRNA sequencing (Data Citation 5) (ii) bacterial abundance at the genus level (Data Citation 5) (iii) aligned sequences for generating the phylogenetic tree (Fig. 3) in Newick format (Data Citation 5), and (iv) taxonomic assignment based on all bacterial gene sequences identified in the shotgun assemblies (Data Citation 5).

Technical Validation

Like other herbivorous caterpillars, the silkworm infests a large amount of plant tissues and its gut full filled with mulberry leaf materials. Therefore, direct DNA extraction from gut content for metagenome analysis commonly fails to capture sufficient bacterial sequences, being masked by overwhelmingly abundant plant DNA sequences, such as the chloroplast contamination. To overcome this limitation, we first filtered the homogenized gut tissue to remove most of plant particles from our sample. Then a Percoll-based density gradient centrifugation was performed to enrich gut bacteria accordingly13. Moreover, before DNA extraction, DNase was applied to remove host DNA contamination from bacterial cell suspension, finally this enzyme was inactivated by heating at 72 °C for 2 min.

For assessing shotgun metagenome data, we compared two universal protocols with the dataset, namely BlastP and MG-RAST38,39, both providing integrative analyses of sequences. Overall, the MG-RAST analysis agreed with BlastP analysis for both taxonomy and functional results (Fig. 2), indicating that there was no obvious technical bias for analyzing shotgun metagenomes in this study. Furthermore, for the 16 S rRNA gene classification, BLASTN was employed to identify the reads labelled “unclassified” at the genus level. The comparison of bacterial taxonomy data between 16 S rRNA sequencing (based on the RDP database) and shotgun metagenomic sequencing (based on Nr database) results verified the community structure.

Additional information

How to cite this article: Chen, B. et al. Comparative shotgun metagenomic data of the silkworm Bombyx mori gut microbiome. Sci. Data. 5:180285 doi: 10.1038/sdata.2018.285 (2018).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.