Whole-genome sequences of 37 breeding line Bombyx mori strains and their phenotypes established since 1960s

Bombyx mori is a key insect in the sericulture industry and one of the very important economic animals that are responsible for not only the livelihood of many farmers internationally but also expended biomedical use. The National Institute of Agricultural Sciences of the Rural Development Administration of Korea (NIAS, RDA, Korea) has been collecting silkworm resources with various phenotypic traits from the 1960s and established breeding lines for using them as genetic resources. And these breeding line strains have been used to develop suitable F1 hybrid strains for specific use. In this study, we report the whole-genome sequences of 37 breeding line B. mori strains established over the past 60 years, along with the description of their phenotypic characteristics with photos of developmental stages. In addition, we report the example phenotypic characteristics of the F1-hybrid strain using these breeding line strains. We hope this data will be used as valuable resources to the related research community for studying B. mori and similar other insects. Measurement(s) Larval period • Cocoon yield • Single cocoon weight • Egg color • Cocoon shape • Number of cocoons per liter • Voltinism • Moltinism • Cocoon color• Larva pattern Technology Type(s) Clock Device • scale • Eye • amount Factor Type(s) Artificial feed • Target Season Sample Characteristic - Organism Bombyx mori Sample Characteristic - Environment house Sample Characteristic - Location South Korea Measurement(s) Larval period • Cocoon yield • Single cocoon weight • Egg color • Cocoon shape • Number of cocoons per liter • Voltinism • Moltinism • Cocoon color• Larva pattern Technology Type(s) Clock Device • scale • Eye • amount Factor Type(s) Artificial feed • Target Season Sample Characteristic - Organism Bombyx mori Sample Characteristic - Environment house Sample Characteristic - Location South Korea


Background
The domestic silkworm, Bombyx mori (Lepidoptera: Bombycidae), has been domesticated more than 5,000 years ago 1 . It is a key insect in the sericulture industry and one of the very important economic animals that are responsible for the livelihood of many farmers internationally. The sericulture industry, which raises silkworms and obtains silk, is a very labor-intensive primary industry and global production continues to decrease due to a decline of production in China, which accounted for the majority of the world's raw silk production with India (https://inserco.org/en/statistics). However, it is still one of the most important economic animals and is being used as a new source of income in some developing countries. In addition to the simple use of B. mori as silk sources in the textile industry, the use of silkworms and silkworm by-products is further expanded in the fields of drugs, tissue engineering, medical textiles, drug delivery systems, cosmeceuticals, food additives, and manufacturing of valuable biomaterials. Therefore, the importance of B. mori as an important animal resource is increasing 2,3 .
As long as the long domestication period of 5000 years, silkworms have been bred to have phenotypes suitable for specific use through strong selection. Domesticated silkworm can produce a large amount of silk and some of them are known to produce 10 times more silk than Bombyx mandarina, which is known as a wild type species of B. mori 4,5 . However, as the environment of sericulture is changing and the usability of B. mori is expanded beyond simple silk production, strains with various phenotypes have the potential to be utilized for various purposes as important biological resources. Because of this importance, even though silk production in general farms is decreasing in South Korea, national research institutes have continuously made efforts to secure useful genetic resources by constructing breeding lines for various strains of B. mori. The National Institute of Agricultural Sciences of the Rural Development Administration of Korea (NIAS, RDA, Korea) has www.nature.com/scientificdata www.nature.com/scientificdata/ been collecting silkworm resources with various expression traits from the 1960s and established a breeding line for using them as genetic resources for F1 hybrid. Strains with various phenotypes can be usefully utilized to enhance specific phenotypes depending on the purpose of use through additional selective breeding and crossbreeding. And they are valuable biological resources to prepare for unexpected environmental changes such as feeding. In addition, the whole-genome sequences of these strains linked to their phenotypes can be used as a major research resource to expand our knowledge of molecular background about B. mori.
In this study, we report the whole-genome sequences of 37 breeding line B. mori strains established over the past 60 years, along with a description of phenotypic characteristics and photos. These whole-genome sequences linked to the phenotypic characteristics of the established breeding line could be valuable resources for the understanding of B. mori genome and provide more insight into the molecular background of various phenotypes.

Methods
Construction and maintenance of breeding lines. For the 37 breeding line strains reported in this study, individuals with phenotypic singularities were first produced through two-way or three-way hybridization using locally collected B. mori strains after the Korean war. All 37 strains were fixed as a breeding line for F1 Hybrid production through selective self-crossing for a minimum of 10 generations so that the strain could maintain the specific phenotype continuously. The established breeding line strain produces 1 generation per year by hatching and raising eggs from the spring and preserving the eggs secured through self-breeding. Egg incubation is carried out under 16 h of light conditions at 15-26°C and 75-80% humidity. After hatching, 1-3 instars are raised at 25-26°C and humidity of 75-80%, and 4-5 instars are raised at 23-24 degrees and humidity of 65-75%. In all instar stages, mulberry leaves are fed 3 times a day to maintain the breeding line.

Library construction and data generation.
For whole-genome sequencing of 37 breeding line strains, representative male individuals for each strain were randomly selected during the pupa stage. The epidermis tissue was isolated from the pupa and DNA was extracted using the QIAGEN DNesay Blood & Tissue Kit. The extracted DNA was subjected to gel electrophoresis to confirm DNA fragmentation, and trinean, picogreen, bioanalyzer were used to check the quality of the DNA. For five tri-molt mutant strains(KRSM, SH, HS, S7 and SD), the sequencing library was constructed using the MGIEasy DNA Library Prep Kit according to the manufacturer's protocol and target size of constructed library was 500 bp. 150 bp paired-end data for 5 strains were generated using MGISEQ-2000 sequecing platform. Libraries for remaing 32 strains were constructed using Illumina Truseq Nano DNA LT Kit and target size of constructed library was 700 bp. 150 bp paired-end data for 32 strains were generated using Illumina Nextseq 500.
Genomics variants and phylogenetic relationship using p50T reference strain. Adapter sequence and low-quality bases were removed by using Trimmomatic 6 with adapter sequence, and filtered reads were mapped to the reference p50T genome 7 from NCBI Refseq using bwa-mem2 8 with default parameter. Removal of PCR duplicated reads and variant calling was performed using samtools 9 , and only biallelic Single Nucleotide Variant(SNV) loci without missing in 38 samples including p50T strain were extracted using VCFtools 10 . InDel and structural variants for each strain were identified using SvABA 11 . All identified variant information can be found in (samtools: https://drive.google.com/file/d/1U3VVh_Q5ER-I6OtcpuqAunHZFtnbaQjG/ view?usp = sharing) and (SvABA: https://github.com/asleofn/B_Mori/). Identified SNVs were annotated using SnpEff using custom DB infromation using Refseq annotation. The cladogram was constructed through the Neighbor-joining algorithm using Tassel 5 12 .

Data Records
The entire data set described in this study is deposited under NCBI Bioproject accession PRJNA751387 13 and NCBI SRA accession SRP331034 13 and accession number for each sample can be found in Tables 1 and 2.

technical Validation
Phenotypes and genome sequences of 37 breeding line strains of B. mori. Like other countries where B. mori is managed as an important economic animal, the NIAS, RDA, Korea has collected various B. mori strains existing in South Korea since the 1960s and established breeding lines of B. mori strains as genomic resources. In the early 1970s and 1980s, breeding was carried out cantered on hardy and high silk-producing strains to increase silk production. However, from the 1990s, after Korea's rapid industrialization, to cope with labor shortages and environmental changes, the focus was on the strains that can use artificial feed, require less labor, and are easily differentiated by gender using larval markings and cocoon colors. The 37 strains reported in this study have important values as seed strains used in the development of customized hybrid strains to respond to changes in the sericulture environment and requests from local farmers. Fig. 1 shows each picture of an egg, larva, cocoon, pupa, and adult from 37 B. mori strains. Table 1 shows the summary information of generated whole-genome sequencing data for each strain and Table 2 shows the summary of phenotypic characteristics of 37 breeding line strains with breeding performance. Minimum depth coverage of generated data was over 30X coverage based on the genome size of B. mori(about 450 Mb).
Genomic variants for each strain were identified using samtools and SvABA. A total 23,478,741 SNVs were identified from samtools and 1,506,850 SNVs(variant quality under Q30 and multiallelic loci) were filtered. Among 21,971,891 SNVs after filtering, 1,327,196 SNVs located in CDS regions. 1,002,715(75.551%) SNVs were synonymous variants and 324,481(24.449%) SNVs were non-synomymous variants. In InDel and structural variant calling using SvABA performed on individual strains, an average of 622,531 InDels and 41,348 structural variants were identified. All variant calling information is available in the link of method section. To figure out the evolutionary relationship of 37 breeding line strains including P50T, phylogenetic analysis was performed www.nature.com/scientificdata www.nature.com/scientificdata/ using whole-genome variants from generated sequencing data. Fig. 2 shows the phylogenetic relationship between 37 B. mori strains reported in this study with the p50T reference strain. Of the five strains showing tri-molt characteristics, four strains except SH showed a close evolutionary relationship, and some strains had closer evolutionary relationships despite the external differences. Through this, it can be expected that the external characteristics identified by the eye are regulated by the small portion of the total genomics variant and more research will be needed to expand our knowledge for the detailed association between the genomic variants and characteristics. Previously, there were several studies on the phenotype, genetic contents, and regional population of Bombyx mori 14,15 . However, this is the first populatoin-level whole genome data that is released from South Korea, and this is the first data set containing the details of breeding performance and phenotypic characteristics each individual strain. With existing dataset of previous study, more expanded data for understanding the gentic background of silkworm phenotype can be built. And the data reported in this study can be utilized as useful resources for marker development and is expected to help develop silkworm strains with desired traits in a short time through genomic breeding or genetic engineering. F1 hybrid strains obtained from 37 breed line strains. The NIAS, RDA, Korea has produced F1 hybrids with the required phenotypes using the 37 seed strains reported in this study, and generated F1 hybrid strains were annually provided to local farmers. This hybrid strain is selected from several hybrid combinations and they have various characteristics to respond to changes in the breeding environment or purpose of use. Table 3 shows the breeding performance and characteristics of representative F1 hybrid strains constructed using 37 breeding line strains. These strains have several important characteristics and the first of which is whether www.nature.com/scientificdata www.nature.com/scientificdata/ artificial feed can be used. The silkworm is a monophagous insect whose main diet is mulberry leaves. Mulberry leaves, which are feed for silkworms, require a lot of labor in the process of producing, storing, and providing them. Since sericulture is carried out according to the production time of mulberry leaves, there is a problem that the breeding period is limited throughout the year. If an artificial feed can be fed, the produced mulberry leaves can be utilized more longer and it reduces the labor required to prepare mulberry leaves. And also increased production through year-round feeding can be expected. In addition, they are very important due to the recent rapid climate change. These strains which can be fed artificial feed can flexibly cope with the change in the productivity of mulberry leaves. The second is a sex-limited inheritance strain that can classify gender using larval pattern or cocoon color. In the case of sex classification of silkworms, classification is possible through the tail part of the 5 instar period or the shape of pupa, but if classification is performed using larva's pattern or color, a lot of labor for   www.nature.com/scientificdata www.nature.com/scientificdata/ gender classification can be effectively reduced. The third is a hybrid strain that produces color silk. Among the 37 breeding line strains, the strain producing cocoons with yellow and light green colors has a lower cocoon size compared to the general strain for silk production. Therefore, hybrid strain is a strain that effectively improved the existing low color silk production. In addition to the direct use of color silk itself, these strains can be used as functional strains for carotenoids or flavonoids required for color silk generation. The fourth is a strain that does not produce a cocoon. The breeding line strain Jam307 in this study produces very few cocoons. Only about 1.2% of individuals produce fibroin-free, sericin-only nets. By dissecting the silk gland of this strain, it can be seen that the posterior silk gland, which is important for fibroin-based filamentation, is degenerated. In the Jam307 x Jam126 hybrid strain, which produces relatively large larva and pupa compared to Jam307, most individuals form sericin nets and normal silk with fibroin was not generated. Through this, it can be expected that the characteristic of Jam307, which produces silk composed only of sericin due to the degeneration of the posterior silk    www.nature.com/scientificdata www.nature.com/scientificdata/ gland, is a dominant trait. This hybrid strain that does not make a cocoon is mainly utilized to use the silkworm itself, such as cordyceps production and silkworm powder for a food additive. Lastly, the most recently developed strain is a hybrid strain of KRSM and Jam124. The phenotypic results were not included in Table 3 because the breeding performance evaluation was not completed yet, but the KRSM x Jam124 hybrid strain has the following characteristics. The KRSM x Jam124 hybrid strain produces light green silk like tri-molt characteristics like B. mori KRSM, but the silk production is similar to the general silk production strain. Fig. 3 shows the cocoons of KRSM, Jam124, and KRSM x Jam124 hybrid strains. The cocoon size of the hybrid strain is almost similar to the silk production strain Jam124. In addition to the increased cocoon size, the total larval period was surprisingly shortened. Unlike KRSM and Jam124, which have larval periods of 25.06 and 25.04 days.hrs, respectively, the total larval period of this hybrid strain was 20.04 days.hrs. It is about 20% shorter than the original strains. Since a 20% reduction in production time can increase silk production as well as reduce the production cost, the hybrid strain is being developed as a useful resource that can contribute to productivity improvement. In addition, the whole genome sequences reported in this study can help to provide more insight into the genetic background of B. mori phenotype and develop modified strain for specific use using genetic engineering.

code availability
All generated sequencing raw reads have been deposited in the NCBI Sequence Read Archive under accession PRJNA751387. The following commands were used to identify the phylogenetic relationship between breeding line strains.  Fig. 3 Cocoon of F1 hybrid offspring between male KRSM and female Jam124. All F1 hybrid offspring were tri-molt mutants with a short larval period and the cocoon size was similar to normal B. mori with LYG color.