Background & Summary

Silkmoth chorion proteins, the main components of the eggshell, are sequentially synthesized and secreted by follicular epithelium cells with a high degree of developmental programming1. The structural genes for chorion proteins comprise a multigene family whose members are grouped under α and β branches based on their evolutionarily conserved central domains2. Chorion proteins are further classified into six subgroups, early A, early B, middle A, middle B, late high-cysteine A (HcA) and late high-cysteine B (HcB), according to their timing of developmental expression and amino acid composition3. Based on genetic linkage mapping, the chorion genes are located between the larval marker p at the proximal end of chromosome 2 and the cocoon color marker Y46. The recent silkworm genome assembly7 indicates that the chorion locus is localized at [1,780,900–3,840,078] on chromosome 2, although it is largely interrupted by gaps due to highly repetitive sequences.

A high quality BAC library was constructed from genomic DNA of silkworm fifth instar day 3 posterior silk glands partially digested with EcoRI8, designated RPCI-96 (RP96), and is available from BACPAC Resources of the Children’s Hospital Oakland Research Institute (BACPAC Resources Center [bacpac.chori.org/]). Here we undertook the following strategy to obtain complete sequences of the chorion locus (Fig. 1): ESTs of chorion genes were used as probes to screen the BAC library, and selected clones were used to construct a BAC contig which covered the complete chorion locus (Fig. 2b). By Sanger sequencing of the BAC contig, we successfully obtained the complete sequence of the chorion locus spanning 871,711 base pairs on chromosome 2, where we annotated 127 chorion genes (Fig. 2c).

Figure 1
figure 1

Schematic overview of the study.

Figure 2: Distribution of genome assembly, BAC contig and annotated chorion genes in the chorion locus.
figure 2

Probes are marked by stars: early chorion genes (black stars); middle chorion genes (green stars); late chorion genes (red stars). The probes used here are presented in Table 1. (a) Diagram of the chorion locus in the B. mori genome assembly. Arrows and dotted lines represent scaffolds and gap regions, respectively, edited from KAIKObase, respectively. (b) BAC contig that covers the chorion locus. Each black line represents a complete BAC region. Six BACs were sequenced except for 544H24, because its sequence was known. (c) Early, middle, late and non-chorion genes are highlighted in black, green, red and yellow, respectively.

We report and describe in detail the methods, data and quality measurements for the construction and sequencing of the silkmoth chorion BAC contig in this paper. Additional information for a comprehensive understanding of the structure, transcription, and proteomics of genes in the chorion locus is described in a related research paper9. In the present paper, we describe in detail our experimental approach for obtaining the complete BAC contig covering the silkworm chorion locus, together with its sequence data and annotation, which are presented briefly in the ‘Materials and Methods’ section of our related paper9. Our strategy can serve as a model to facilitate the sequencing of selected loci in genomes in other species containing highly repetitive sequences.

Methods

EST analysis of follicular cell and ovary cDNA libraries

To identify chorion gene transcripts, we analyzed ESTs of two newly constructed cDNA libraries, fcP8 derived from day 8 pupal follicular cells and bmov from day 4 pupal ovaries. All ESTs derived from the bmov and fcP8 cDNA libraries are accessible at the DNA Database of Japan (acc # FY000001-FY021573 for bmov and BY918786-BY920388 and BY927072-BY928825 for fcP8). We identified ESTs of chorion genes by BLASTx search in public protein databases including nr of NCBI.

BAC screening

The silkworm BAC library (RPCI-96) used in this paper was obtained from BACPAC Resources Center, Children’s Hospital, Oakland Research Institute and previously described8,10. BAC clones derived from the chorion locus were screened by hybridization of BAC high density replica (HDR) filters arrayed in duplicate with RPCI-96 BAC clones (BACPAC Resources Center [bacpac.chori.org/]) using the ESTs of 10 chorion genes selected as representatives of the three chorion families which provided strong signals in hybridization with multiple BACs, among which some were cross-hybridized with different chorion families. A list of ESTs used for BAC screening is presented in Table 1. Labeling, hybridization and detection were performed using the ECL Direct Nucleic Acid Labeling and Detection System (GE Heathcare UK Ltd., Little Chalfont, Buckinghamshire, UK), in accordance with the manufacturer’s instructions8.

Table 1 ESTs used as probes for screening BAC clones.

Construction of a BAC contig covering the chorion locus

Two hundred and two BAC clones from early, middle and late chorion gene regions were screened with EST probes of representative chorion genes from the fcP8 cDNA library by hybridization of an HDR filter of the RPCI-96 silkworm BAC library. Among positive BAC clones, we chose highly positive BAC clones 077P06 and 094B01 for early chorion genes, 081P21 and 076K18 for middle chorion genes, and 018E13 for late chorion genes. We also selected clone 503L05, which had a strong positive signal and was known to cover a non-chorion domain of the locus based on its BAC end sequence, BES_503_L05 (acc # DE379518), in (http://sgp.dna.affrc.go.jp/KAIKObase/), and BAC 544H24, because we already knew that its full sequence was aligned with the 3′ part of the chorion locus and the neighboring region7. We performed contig construction for these BAC clones with the fingerprinting method described previously10. This resulted in two contigs; one was composed of four BACs covering the 5′ half of the chorion locus, while the other was composed of three BACs aligning with the 3′ half of the chorion locus (Fig. 2a). One of the 076K18 BAC-end sequences, BES_076_K18 (acc # DE307437), aligned to Bm_scaf166 at [chr2: 2,636,193-2,636,430], and the 5′ end of the other BAC contig, 077P06 BAC end-sequence BET_077_P06 (acc # DE354956), was located on the same scaffold, Bm_scaf166, at [chr2: 2,647,297-2,647,961]. Thus, the two BAC contigs, which were connected on Bm_scaf166, covered the whole chorion locus (Fig. 2a).

Genomic sequencing

Six BAC clones from 384 well plates11 were streaked separately on chloramphenicol-containing LB plates. Three single clones from each plate were checked to confirm the correct BAC clone by using primers designed from the end sequences of each BAC (Table 2). Then BAC clones were cultured for isolation of BAC DNA in LB medium. BAC DNA was extracted using a Large-Construct Kit (QIAGEN) in accordance with the manufacturer’s instructions. Two kilobase and five kilobase shotgun libraries for each BAC were constructed using a pUC118 vector12. For each library, approximately 590 clones were picked for bidirectional sequencing performed with an ABI3730 DNA Analyzer (Applied Biosystems).

Table 2 The list of primers for detecting the BAC clones

Sequence assembly and annotation of chorion genes

The low-quality bases (QV<20) were removed by Phred13. After trimming vector sequences using cross_match, all paired-end reads were assembled with the programs Phrap 1.0808122214 and Consed 16.015. The position of mis-assembled clone sequences could be adjusted according to the size of the clones (insertion segment) by both assembly programs. The small gap in assembly sequences was filled by primer walking. The software program fgenesh16 was used to predict the chorion genes.

Data Records

Data record 1

The complete sequence of the chorion locus appears under DDBJ AB999997 (Data Citation 1).

Technical Validation

Probe selection and construction of BAC contig

Previous reports revealed that the chorion locus is composed of three types of clusters containing early, middle and late chorion genes3. Thus, we selected representatives for the three types of chorion gene ESTs to screen the BAC library (Table 1). Among ten probes, eight of them were identified and oriented in the published genome of B. mori7, and both end sequences of BACs were used to confirm the orientation of BACs. BAC end sequence-based primers were used to confirm the orientation and position of BACs in the chorion locus by PCR (Supplementary Fig. 1; see Table 2 for primer sequences). The PCR experiment showed that the target BACs were sequentially connected with an overlap to cover the whole chorion locus, except for a small gap region. Then, we were able to obtain sequences for the gap region between BACs 076K18 and 077P06 from Bm_scaf166 in the silkworm genome sequence. These strategies enabled us to establish a complete BAC contig covering the chorion locus.

Sequencing and assembly

In a first attempt to obtain the complete sequence of the chorion locus, we used Ion PGMTM, a representative of a second generation sequencing platform characterized by low cost, high throughput and read lengths of up to 289 bp. Unfortunately, the presence of highly repetitive DNA sequences resulted in a failure to obtain an assembly of individual BACs despite a coverage of 150-fold. For further assistance in sequence assembly, we constructed 2 and 5 kb shotgun libraries for each BAC and sequenced them using the Sanger method. This enabled the generation of reads up to 500 bp, which were able to cover major exons of chorion genes, on the order of 500–800 bp. About 2,400 reads were generated for each BAC, which covered the chorion locus 10-fold. The positions of the BACs in the complete chorion locus are shown in Table 3.

Table 3 BACs and their position in the complete chorion locus

Annotation of chorion genes

Two EST libraries from day 4 pupal ovary and day 8 pupal follicular cells were constructed which contained ESTs of all known chorion genes. ESTs were aligned to the chorion locus, which further confirmed the existence of the predicted chorion genes.

Usage Notes

The complete sequences of chorion locus data described here can be downloaded from DDBJ AB999997. This data descriptor provides an opportunity to present a strategy for obtaining precise sequence information for an extended region (>0.8 Mb) of a highly repetitive genome. The complete sequence of the chorion locus and detailed gene annotation data are provided for users to study developmental regulation of gene expression using the silkmoth chorion gene model.

Additional Information

How to cite this article: Chen, Z. et al. Construction, complete sequence, and annotation of a BAC contig covering the silkworm chorion locus. Sci. Data 2:150062 doi: 10.1038/sdata.2015.62 (2015).