Discrimination of Camellia cultivars using iD-NA analysis

Recently, many new cultivars have been taken abroad illegally, which is now considered an international issue. Botanical evidence found at a crime scene provides valuable information about the origin of the sample. However, botanical resources for forensic evidence remain underutilized because molecular markers, such as microsatellites, are not available without a limited set of species. Multiplexed intersimple sequence repeat (ISSR) genotyping by sequencing (MIG-seq) and its analysis method, identification of not applicable (iD-NA), have been used to determine several genome-wide genetic markers, making them applicable to all plant species, including those with limited available genetic information. Camellia cultivars are popular worldwide and are often planted in many gardens and bred to make new cultivars. In this study, we aimed to analyze Camellia cultivars/species through MIG-seq. MIG-seq could discriminate similar samples, such as bud mutants and closely related samples that could not be distinguished based on morphological features. This discrimination was consistent with that of a previous study that classified cultivars based on short tandem repeat (STR) markers, indicating that MIG-seq has the same or higher discrimination ability as STR markers. Furthermore, we observed unknown phylogenetic relationships. Because MIG-seq can be applied to unlimited species and low-quality DNA, it may be useful in various scientific fields.

relationship and population structure, along with cultivar identification and discrimination, which are not possible using previously reported sequencing techniques 32,33 .Furthermore, a novel analysis method called identification of not applicable (iD-NA), which uses MIG-seq results, was reported recently 34 .Unlike widely used programs that use complex processes and occasionally make errors when identifying SNPs (Fig. 1), iD-NA directly compares NGS reads to determine the exact matching rate between the target and query samples.These distinct characteristics indicate the potential of iD-NA for distinguishing among samples, including illegally used plant cultivars.
The genus Camellia contains more than 200 species and is mainly distributed in the southern and southwestern parts of China 35 .The ornamental Camellia garden plant has gained substantial popularity as a rare winter-blooming evergreen tree grown in temperate areas.Therefore, the presence of Camellia garden plants can serve as an evidence in criminal investigations, indicating the need for discerning their origin.The overwhelming majority of garden Camellia cultivars are derived from Camellia japonica L.Moreover, C. japonica subsp.rusticana (Honda) Kitamura, C. sasanqua Thunb., and C. reticulata Lindl.have also been utilized, which are used to develop hybrids of the majority of modern Camellia cultivars 35 .Many cultivars have origins that are not clearly determined because they have been planted for hundreds of years.It is believed that over 100 named cultivars existed in Japan in the seventeenth century.During the seventeenth and nineteenth centuries, they were introduced into Europe, and many cultivars have been currently bred not only in Japan but also in Europe.In the past several decades, the breeding of interspecific hybrids of the genus Camellia has resulted in the production of many diverse new cultivars.New genetic resources have been used to increase diversity among Camellia cultivars, and interspecific hybrid cultivars have been more consistently bred than ever before 36 .In cases where new cultivars have been unlawfully taken abroad, it becomes necessary to distinguish between the confiscated samples and original cultivar for court proceedings.Furthermore, elucidating the genealogical relationships can aid in the efficient breeding of new cultivars.However, only a few studies have been conducted on Camellia garden plants [37][38][39] , and the relationships among these plants have not yet been determined.In addition, intracultivar diversity has not been reported.For garden cultivated plants, cutting and seeding are often practiced to increase their number 40 .Therefore, genetic diversity within a cultivar is not very high.Moreover, Camellia has many cultivars derived from bud mutation in spite of the variety of traits.For example, "Akaezo (red Ezonishiki)", "Shiroezo (white Ezonishiki)", and "Ezonishiki" are all bud mutants (Fig. 2)."Ezonishiki" has a variegated red-white flower, whereas "Akaezo" has a red flower and "Shiroezo" has a white flower.
In this study, we used MIG-seq to analyze cultivars/species within the genus Camellia, given the potential for discrimination among Camellia garden plants.MIG-seq could discriminate similar samples, such as bud Figure 1.Schematic diagram of the discrimination procedure applied for identification using the iD-NA method.This method directly compares NGS reads and determines the exact matching rate between the target and query samples through three steps: (1) applying stringent filters to NGS raw data to refine sequencing reads, (2) identifying the sequencing reads that should be present in the target sample as reference, and (3) conducting a comprehensive search for sequencing reads that exactly match the target sample.

Comparison of SNPs obtained using MIG-seq among different cultivars
To investigate whether MIG-seq can discriminate between cultivars in detail, the output sequences were evaluated using another method called iD-NA.The obtained high-quality sequences of the sample were set to reference data, and the number of reference sequences detected in other cultivars was examined.If a given sample shares all super high-quality sequences of the counterpart, the value of similarity is 1.0.The same cultivars were found to share many sequences even if they were planted in different gardens (Fig. 4 and Table S1), as also evidenced by phylogenetic tree analysis.These results indicate that the iD-NA method is as effective as or more effective than the traditional method of using SNPs and phylogenetic analysis in identifying genetic differences between cultivars.
Next, all samples were compared and validated (Fig. 5).Although the observed heterozygosity was lower than expected, indicating that inbreeding had occurred among the samples in this study (Table S2), the similarity between the same cultivars differed from that between different cultivars.The average of the similarity between the same cultivars was 0.98 ± 0.05 (mean ± s.d.) and the comparison indicated that all super high-quality sequences could be observed in other same cultivar samples.Only three cultivars (KMA10H, SEZO1H and SGK101S) had moderate similarity (Table S1).KMA10H and other same cultivar samples shared a value of 0.74-0.80,which is lower than that of the other "Kumagai (KMA)" samples.KMA has several lines which were bred in different regions of Japan 35 .This raises the possibility that KMA10H has a different line from the other samples.SEZO1H and the other same cultivar samples (SEZO113J and SEZO126J) shared a value of 0.65-0.74.They were sampled from different gardens, indicating that they were derived from different sources.SEZO is a white bud mutant of "Ezonishiki".To the best of our knowledge, no studies on the mechanisms of bud mutations of Camellia species have been conducted.However, previous studies suggested that white-petal varieties were obtained from the colored variety by cosuppressing the expression of genes encoding chalcone synthase and dihydroflavonol-4-reductase [41][42][43] .These genes are involved in the biosynthesis of anthocyanin, which are a colored class of flavonoids responsible for the pink, red, violet, and blue colors of flowers.It is possible that different factors resulted in white bud mutants, and these different lines were dealt with as a same cultivar because  propagation might have changed during the course of hundreds of years.It is also possible that some ancient plant propagation methods might have increased the diversity in the same cultivars.These results suggest that MIG-seq can discriminate not only the same cultivars but also different lines within the same cultivars."Gigantea (GIG)", "Okinonami (OKI)", "Otometsubaki (OTTU)", "Setsugekka (Sgekka)", and "Ginryu (GIN)" could be discriminated using STR markers, as discussed in a previous study 37 .The same cultivars shared a value of 0.99-1 (99-100%), although the different cultivars shared a value of 0.14-0.57(14-57%; Table 1).This indicates that the iD-NA method using MIG-seq exhibits comparable or superior discrimination ability to STR markers.Moreover, it enables the analysis of species that cannot be studied using intraspecies analysis methods, providing a high-resolution approach using iD-NA with NGS.

Comparison of SNPs obtained using MIG-seq within the same and related cultivars
We investigated the differences among bud mutations, namely, "Ezonishiki (original)", "Akaezo (red petal mutant of "Ezonishiki")", and "Shiroezo (white-petal mutant of "Ezonishiki")".The similarity between the original cultivars and red petal mutants (0.98 ± 0.03) was almost the same as that between the original cultivars (0.98 ± 0.02), whereas more differences were found between the original cultivars and white-petal mutants (0.79 ± 0.03; Figs. 6,  S1).This is in good agreement with the explanation by the planted park of "Ezonishiki" and "Ezoshibori" that in "Ezonishiki", changing to the red bud mutant from the original cultivar is easier than changing to the white bud mutant.It was reported that some genes or transposable elements introduced white color into colored flowers or some color into white flowers, resulting in variegated flowers [45][46][47][48][49] .This variegated flower color is valuable horticulturally because of its beauty, and a variegated flower is therefore treated as the original cultivar horticulturally.However, biologically original cultivars have a single color, such as a red flower.Changing to an absolutely different flower may be more difficult than to a variegated flower.These findings suggest that MIG-seq can discriminate between intracultivar differences.Significant morphological differences in sepals between "Ezonishiki" and "Ezoshibori (EZO1H)" have not been reported.Both "Ezonishiki" and "Ezoshibori" have variegated red-white flowers (Fig. 2a,d).Moreover, it was explained by the planted park of "Ezoshibori" that "Ezoshibori" is not a common cultivar name, and therefore, "Ezoshibori" may be a synonym of "Ezonishiki".However, the similarity between "Ezonishiki" and "Ezoshibori" (0.75 ± 0.03) was lower than that of the original "Ezonishiki" samples (0.98 ± 0.02; Figs. 6, S1).Our results suggest that they are not classified under the monophyletic group and that the degree of difference was the same as the white-petal mutant (0.79 ± 0.03).These findings indicate that iD-NA with MIG-seq could discriminate between the closely related groups that could not be distinguished based on morphological features.
To investigate whether iD-NA could discriminate closely related cultivars, we compared "Ezonishiki" and "Soshiarai" based on SNPs via phylogenetic analysis."Soshiarai" was the most closely related cultivar of "Ezonishiki" (Fig. 3)."Ezonishiki" and "Soshiarai" shared a value of 0.81 ± 0.03, whereas "Ezonishiki" cultivars shared a value of 0.98 ± 0.02 (Figs. 6, S1).These results indicate that iD-NA could discriminate closely related samples as expected.Our method can determine clear differences between the samples according to the genetic distances.
In this study, we analyzed Camellia cultivars or species using iD-NA with MIG-seq.We found that iD-NA could discriminate very similar samples, such as bud mutations and closely related samples, although the samples had a possibility of inbreeding.MIG-seq and iD-NA have no limitation of adaptable species.Every species has a potential to become a forensic sample.Although we cannot develop polymorphic markers for every species, iD-NA with MIG-seq can be used to facilitate criminal investigations.Moreover, this method can evaluate unknown samples with regard to biological features, polyploidy, heterogeneity, and mating patterns, such as selfing, apomixis, and vegetative reproduction, and the similarities are easy to compare using simple programs and parameter settings and the differences can be easily visualized and analyzed.For example, a phylogenetic tree is useful for visually understanding the relationship between samples; however, SNPs for the phylogenetic tree obtained using MIG-seq without a reference genome depend on many parameters of SNP calling and filtering Table 1.Pairwise comparisons among cultivars discriminated using STR markers, as discussed in a previous study 37 .The obtained high-quality sequences of the sample were set as per reference data and the examination of how many reference sequences were detected in other cultivars was conducted.If a given sample shares all super high-quality sequences of the counterpart, the value of similarity is 1.0 counterpart, the value of similarity is 1.0

Figure 3 .
Figure 3. Dendrogram for Camellia cultivars generated via MIG-seq analysis.* indicates flowers with a spotted pattern of red and white.The numbers at the branches are bootstrap values.

Figure 4 .
Figure 4. Pairwise comparison of Camellia cultivars/species.Histogram showing the shared sequences between two samples.Darker color indicates that more shared sequences were obtained.The sample IDs are shown on the top and left.

Figure 5 .
Figure 5. Histogram showing the distribution of the ratio of shared sequences between the two samples.(a) Among all samples; (b) among the same cultivars; (c) among different cultivars; (d) average of the samples.If two samples have a closely related phylogenetic relationship and share all high accuracy sequences of the counterpart, the value is 1.The error bar indicates standard deviation.

Figure 6 .
Figure 6.Average of the ratio of the shared sequences among "Ezonishiki" and the related cultivars.If the two samples have a closely related phylogenetic relationship and share all high accuracy sequences of the counterpart, the value is 1.The error bar indicates standard deviation.