Identification and genetic diversity analysis of a male-sterile gene (MS1) in Japanese cedar (Cryptomeria japonica D. Don)

Identifying causative genes for a target trait in conifer reproduction is challenging for species lacking whole-genome sequences. In this study, we searched for the male-sterility gene (MS1) in Cryptomeria japonica, aiming to promote marker-assisted selection (MAS) of male-sterile C. japonica to reduce the pollinosis caused by pollen dispersal from artificial C. japonica forests in Japan. We searched for mRNA sequences expressed in male strobili and found the gene CJt020762, coding for a lipid transfer protein containing a 4-bp deletion specific to male-sterile individuals. We also found a 30-bp deletion by sequencing the entire gene of another individual with the ms1. All nine breeding materials with the allele ms1 had either a 4-bp or 30-bp deletion in gene CJt020762, both of which are expected to result in faulty gene transcription and function. Furthermore, the 30-bp deletion was detected from three of five individuals in the Ishinomaki natural forest. From our findings, CJt020762 was considered to be the causative gene of MS1. Thus, by performing MAS using two deletion mutations as a DNA marker, it will be possible to find novel breeding materials of C. japonica with the allele ms1 adapted to the unique environment of each region of the Japanese archipelago.

In recent years, the tightly linked DNA markers to perform marker-assisted selection (MAS) of individuals with MS1 have been developed [20][21][22][23][24][25] . Mishima et al. suggested reCj19250 (the DEAD-box RNA helicase gene) as a candidate gene for MS1 in C. japonica 23 . However, 'Ooi-7' heterozygous for MS1 could not be selected with two SNP markers contained in reCj19250 22 , suggesting that reCj19250 was not the causative gene of MS1. Thus, to efficiently select individuals with ms1 using MAS from various C. japonica populations, it is essential to search for genetic mutations that can consistently explain the MS1 phenotype.
For the MAS of C. japonica trees that are homozygous or heterozygous for MS1 and the breeding of malesterile C. japonica trees adapted to the unique environments in each region of the Japanese archipelago, it is important to elucidate the diversity of genetic mutations responsible for MS1 and the phylogenetic origin of MS1. Thus, in this study, we conducted the following analyses: (1) to identify the genetic mutation specific to homozygous or heterozygous individuals for MS1 from mRNA sequences expressed in male strobili of C. japonica; (2) the functional annotation of candidate genes and deletion mutations causing ms1; and (3) construct the haplotype network and haplotype map of MS1 candidate genes using breeding materials with ms1 and trees from natural forests on the Japanese archipelago.

Results
RNA sequencing in the coding region of CJt020762. RNA-sequencing (RNA-Seq) data were analysed to identify the genetic variation specific to individuals with the recessive allele (ms1) of MS1 among the cDNA sequences expressed in male strobili of C. japonica. RNA-Seq data from six libraries (Table S1) were used 26 . The genotype of MS1 for these samples was determined based on phenotype and/or artificial crossing. cDNA sequences were screened using the following two criteria for MS1 candidate genes: (1) sequences expressed in male strobili but not in needles and inner bark and (2) sequences with twofold higher expression in male-fertile strobili compared to male-sterile strobili and vice versa were selected.
From the above procedure, 108 cDNA sequences were obtained as candidate genes of MS1. One of them, CJt020762, contained an SNP marker (AX-174139329; Hasegawa et al. 22 ) located in the same position (0 cM) as MS1 in the linkage map of the 'Fukushima-1' × 'Ooi-7' family. Furthermore, a 4-bp deletion causing a frameshift mutation was found in the coding region of CJt020762 ( Figs. 1 and 2). This deletion was not detected in wildtype homozygous individuals but was detected heterozygously in three heterozygous individuals for MS1, and homozygously in two male-sterile individuals. Therefore, the presence of the ms1 allele and this genetic mutation in the CJt020762 region were consistent with MS1. We hereafter focused our analysis on this gene.
CJt020762 codes for a lipid transfer protein gene containing a signal peptide, a plant lipid transfer protein domain, and a transmembrane domain in the coding region (Fig. 1). Furthermore, the glycosylphosphatidylinositol (GPI) anchor domain that contributes to fixing the protein on the outside of the plasma membrane was predicted at the C-terminus of CJt020762 (Figs. 1 and S1).
Genomic DNA sequence of MS1. Eight primers were designed based on the cDNA sequence of CJt020762 (Table S2), and the Sanger method was used to sequence the genomic DNA of 'Fukushima-1' ( Table 1). The genome sequence of CJt020762 was 2,556-bp long, containing three coding sequence (CDS) regions (Figs. 1 and S2).
Haplotype composition of CJt020762 in breeding materials and individuals from natural forests. The haplotype sequences of CJt020762 of nine breeding materials with ms1 and 74 individuals from 18 natural forest populations were determined. As a result, 49 haplotypes were detected from 83 individuals ( Fig. 3 and Table S3). Of these haplotypes, only two (No. 38 (ms1-1) and No. 4 (ms1-2)) contained the deletion www.nature.com/scientificreports/ in the coding region. In haplotype No. 38, the frameshift mutation caused by the 4-bp deletion occurred in the middle of the plant lipid transfer protein domain (Fig. 2). On the other hand, in haplotype No. 4, the deletion of amino-acids occurred in the transmembrane domain due to the 30-bp deletion (Fig. 2). Furthermore, the modification site of the GPI anchor domain on the C-terminal side, which is important for transmembrane functioning, was lost after the 30-bp deletion ( Figure S1). All individuals homozygous for MS1 (ms1/ms1) (i.e., 'Fukushima-1' , 'Fukushima-2' , 'Shindai-3' , 'Shindai-11' , 'Shindai-12' , and 'Tahara-1') had the homozygous haplotype No. 38 ( Fig. 3 Table S4). However, they were not detected in the breeding materials for ms1.

Discussion
In this study, we searched for genetic mutations specific to C. japonica breeding materials with ms1 using RNA-Seq data. Based on the results, we identified CJt020762 as a causative gene for MS1. There is additional existing evidence supporting this finding. Firstly, CJt020762 contains the SNP marker (AX-174139329) mapped on 0 cM from MS1 in the linkage map of the 'Fukushima-1' × 'Ooi-7' family 22 , indicating that CJt020762 was a gene in the   correspond to those in Fig. 3 and Table S3. The map was generated using R3. 5 www.nature.com/scientificreports/ vicinity of MS1. Secondly, CJt020762 was expressed in male strobili, where it is active. Thirdly, both 4-bp and 30-bp deletions are expected to cause dysfunction in this particular gene: the 4-bp deletion mutation deleted the amino-acid sequence of the plant lipid transfer protein domain, causing a frameshift, while the 30-bp deletion mutation deleted 10 (63%) amino-acids in the transmembrane domain (16 amino-acids). Furthermore, the modification site of the GPI anchor domain that contributes to fixing the protein to the outside of the plasma membrane was lost after the 30-bp deletion. These amino-acid sequence mutations are likely to result in the malfunction of the lipid transfer protein coded by CJt020762. Fourth, the cross between an individual with homozygous for the 4-bp deletion (ms1-1/ms1-1) and an individual with heterozygous for the 4-bp deletion (Ms1/ms1-1) resulted in 1:1 ratio of the male-sterile and male-fertile offspring in both of two families (i.e., 'Shindai-11' × 'Suzu-2' and 'Shindai-12' × 'Suzu-2') 27 . Futhermore, the male-sterile and male-fertile offsprings always had the 4-bp deletion homozygously and the 4-bp deletion heterozygously in CJt020762, respectively 27 . The fifth piece of evidence suggesting that CJt020762 is the causative gene of MS1 comes from the mapping family 'Fukushima-1' × 'Ooi-7' , where male-sterility occurred in one-half of offsprings 22 . Result of the DNA sequencing (Table S3) revealed that 'Fukushima-1' was homozygous for the 4-bp deletion haplotype (ms1-1/ms1-1) and that 'Ooi-7' was heterozygous, with the 30-bp deletion and wild-type haplotype (Ms1/ms1-2). Therefore, it was clarified that even if CJt020762 is heterozygous for the 4-bp deletion haplotype and 30-bp deletion haplotype (ms1-1/ms1-2), male-sterility occurs. Such a phenomenon occurs when CJt020762 codes a protein essential for pollen production and the protein function is lost due to the 4-bp deletion in the plant lipid transfer protein domain and the 30-bp deletion in the transmembrane domain. This result also indicates that both the plant lipid transfer protein domain and the transmembrane domain contained in CJt020762 is essential for pollen production in C. japonica. The sixth and final supporting evidence stems from the functional and structural similarity of CJt020762 with wheat male-sterility genes. In wheat male-sterility (Triticum aestivum), both ms1 and ms5 possess recessive inheritance and single locus control, hence male-sterility is caused by the failure of exine development during microspore formation [28][29][30][31] . MS1 in C. japonica and ms1 and ms5 in wheat have similar male-sterility phenotypes. In terms of protein structure, CJt020762 contains the signal peptide, plant lipid transfer protein domain, transmembrane domain, and the GPI anchor domain modification sites (Fig. 1). This structure is similar to ms1 and ms5 in wheat [29][30][31] . Furthermore, while the plant lipid transfer protein domain and transmembrane domain of CJt020762 may be necessary for pollen production in C. japonica, an analysis of mutant wheat revealed that both the lipid transfer protein domain and transmembrane domain of wheat ms1 were necessary for pollen production 30 . Because of the similarities with wheat male-sterility genes, CJt020762 is thought to be essential for pollen production, as well as ms1 and ms5 in wheat, despite their respective aminoacid sequences being different to each other, with percentage identity ranging from 27.0 to 31.4 for the highest scoring segment pairs in BLASTP. Based on above evidence, we propose that the CJt020762 section is the MS1 gene itself. We clarified the genomic DNA haplotypes of CJt020762 in 83 individuals, including nine breeding materials with ms1 and 74 individuals from 18 natural forests. Eight individuals ('Fukushima-1' , 'Fukushima-2' , 'Shindai-3' , 'Shindai-11' , 'Shindai-12' , 'Tahara-1' , 'Naka-4' , 'Suzu-2') with ms1-1 (haplotype No. 38) had a 4-bp deletion in the plant lipid transfer protein domain. Additionally, 'Ooi-7' with ms1-2 (haplotype No. 4) had a 30-bp deletion in the transmembrane domain. These results suggest that these deletions were caused by independent two genetic mutation events. In the Ishinomaki natural forest, haplotype No. 4 with the 30-bp deletion (ms1-2) was found in 3/5 individuals. This result indicates that individuals with haplotype No. 4 may be more frequent in the Ishinomaki natural forest. The distance between the selected location of 'Shindai-3' , 'Shindai-11' , 'Shindai-12' and that of 'Tahara-1' , 'Naka-4' was approximately 300 km and that between the selected location of 'Ooi-7' and the Ishinomaki natural forest was approximately 450 km. Therefore, the single genetic deletion mutations were dispersed over hundreds of kilometers. Furthermore, haplotype No. 4 (ms1-2) was derived from ancestral haplotype No. 2, which is distributed in three southern populations (Oki, Azouji and Oninome). As haplotype No. 4 was found on the Pacific Ocean side in the central and northern regions of Honshu island ('Ooi-7' and Ishinomaki forest), it may have expanded further north along the Pacific side. Using markers developed during this study, a number of individuals with ms1 can be selected from all over Japan in the future. Identifying the CJt020762 haplotype of these individuals and comparing it to the distribution of ancestral haplotypes will clarify the historical gene flow of ms1.
The findings of this study suggest that CJt020762 is the causative gene for MS1 of C. japonica and contribute to the MAS of MS1 for breeding of male-sterile trees. Although the 16 individual trees with ms1 have-until now-all been found in Japan 5,[11][12][13][14][16][17][18] , it is necessary to select more individuals by MAS to obtain breeding materials in each region, a C. japonica had differentially adapted to unique environments in different regions of the Japanese archipelago, such as around the Japan Sea (heavy snow in winter) or along the Pacific coast (dry in winter) 32 . To definitely prove that CJt020762 is the causative gene for MS1, it is essential to show that male-sterility occurs by knocking out CJt020762 in C. japonica through genome editing in future research.

RNA-sequencing and MS1 annotation.
To identify causative single nucleotide variations (SNVs) and/ or insertions or deletions (indels) in expressed genes, we analysed RNA-sequencing (RNA-Seq) data from six libraries (Table S1) from the mapping family of MS1. These data have been reported in Ueno et al. 25 and Wei et al. 26 In this study, we focused on the levels of gene expression and selected candidate genes using CJ3006All by Wei et al 26 as a reference. We compared the expression levels among the 10 libraries. When multiple sequencing runs (sampling stages) were included in each library, we considered these runs as repetitions and labeled them as "_rep1, " "_rep2, " and so on (see a supplementary table by Wei et al. 26 for details). Gene expression was quantified and compared by "kallisto 33 " and "sleuth 34 , " respectively, as described by Wei et al 26  www.nature.com/scientificreports/ were selected with the following criteria: (1) expression levels (in transcripts per million [tpm]) in leaf or bark tissue (Ooi-7_rep1) < 10 and male strobilus tissue (S1s_rep2 and S3s_rep3) > 10; (2) expression levels (tpm) in fertile strobili (S3s_rep3) was more than twice compared to those in sterile strobili (S4s_rep3) and vice versa. We selected 108 candidate genes (Table S5) for MS1, which were then examined for SNVs and indels. All of the read mapping of the RNA-Seq was checked manually using an IGV (Integrative Genome Viewer) 35 for homozygosity which was the only criteria after filtering by the expression levels before examining the variant effects. After we identified the causative functional mutation for MS1 from the RNA-Seq data ( Figure S2) and verified the marker (AX-174139329) position on the linkage map, we examined the translated amino-acid sequences of the candidate gene (CJt020762) for the functional domain and annotated using SignalP-5.0 36 Figure S3 and  Figure S3 and Table S2). Sequence alignment was performed using CodonCode Aligner v.8.0.2 software (CodonCode Corporation, Dedham, MA, USA), followed by manual editing.
Amplicon sequencing of CJt020762. We developed seven primer pairs using the PCR suite software 40 ( Figure S3 and . PCR was performed in a 20 μL volume containing 2 μL of 20-times diluted PCR products, 1 × KAPA HiFi HS ReadyMix and 4 μL of Access Array primers. The thermal profile for PCR was as follows: an initial denaturing step for 5 min at 95 °C, followed by 12 cycles of 15 s at 95 °C, 30 s at 60 °C and 60 s at 72 °C before a final elongation step at 72 °C for 3 min by using a GeneAmp 9700 PCR System (Applied Biosystems). After PCR, each DNA sample was purified using AMPure XP (Beckman Coulter, Brea, CA, USA). DNA concentration was determined using the Qubit fluorometer in the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA). An equal amount of DNA from each sample was mixed to construct an amplicon sequencing library. The library was size-selected using BluePippin (1.5% agarose cartridge, Sage Science, Beverly, MA, USA) under the range of 400-700 bp. The DNA concentration of the library was determined using the LightCycler 480 Real-Time PCR System (Roche, Basel, Switzerland) with KAPA Library Quantification Kit (KAPA Biosystems). Finally, the library was sequenced in 2 × 251-bp paired ends on MiSeq (Illumina, San Diego, CA, USA). Reads from each individual were automatically classified on MiSeq according to the tag sequences (DRR206539-DRR206627). After reads were cleaned using the Trimmomatic programme 41 , they were mapped onto the genomic sequence of CJt020762 from 'Fukushima-1' using the BWA mem algorithm 42 . Each mapping file (BAM) was imported into the Geneious software (Biomatters Ltd., Auckland, New Zealand) and possible SNP sites were manually checked. The consensus sequence was exported, then haplotype sequences of CJt020762 were estimated using the Phase software 43,44 . Because C. japonica is a diploid and has two haplotypes per individual, the haplotype ID was named as individual name_1 or individual name_2 (Table S3). Furthermore, to clarify phylogenetic relationships among the CJt020762 haplotypes, haplotype networks were created using the PopART software 45 . Six individual trees (' Ajigasawa02' , 'Bijodaira03' , 'Tsuyama03' , 'Tsuyama07' , 'Yakushima02' , and 'Yamanouchi04') were excluded from the haplotype analysis as insufficient DNA sequences were obtained.