Identification of hybridization and introgression between Cinnamomum kanehirae Hayata and C. camphora (L.) Presl using genotyping-by-sequencing

Cinnamomum kanehirae Hayata and C. camphora (L.) Presl are important tree species in eastern Asia. The wood of C. kanehirae is in increasing demand for culturing Antrodia cinnamomea, a medicinal fungus that naturally grows inside the trunk of C. kanehirae. Putative hybrids between C. kanehirae and C. camphora were previously reported but with no scientific evidence, leading to confusion or misplanting. First, to identify the female parent of putative hybrids, the maternal inheritance InDel (insertion/deletion) markers were developed by using low-coverage sequencing. SNPs were developed by using genotyping-by-sequencing (GBS) approach in C. kanehirae, C. camphora and putative hybrids. The results indicated that the female parent of the studied hybrids was C. camphora. Eight hundred and forty of the 529,006 high-density SNPs were selected and used for analysis. Hybrids were classified as F1 (C. kanehirae × C. camphora), F2 and backcrosses. Hybridization has occurred in the human-developed area of eastern and southwestern Taiwan, and the introgression was bidirectional. For producing pure wood, buffering zones should be established around seed orchards to avoid cross-species pollination and to preserve the genetic purity of C. kanehirae. The DNA markers developed in this study will also be valuable for further wood identification, breeding and evolutionary research.


Results
The validation of InDel markers. Low-coverage sequencing was performed in three individuals of two Cinnamomum species: C. kanehirae (S99), C. camphora (JLH2) and C. camphora (JZS). Statistical data from this low-coverage sequencing are available in Supplementary Table S1 online. In C. kanehirae, we obtained the greatest number of clean bases after trimming. In total, we found 481 InDels that satisfied our filtering parameters. We validated the five longest InDels in three data sets. Finally, 7 out of 15 validated InDels (46.67%) were polymorphic in C. kanehirae and C. camphora. Among the 7 InDels, 1 had blastn matches to nuclear mRNA, 2 had blastn matches to mitochondrial sequence, 1 had blastn matches to chloroplast sequence, and 3 had no blastn matches (see Supplementary Table S2 online). The InDel marker named ZS_CK InDel_1 is annotated to mitochondrial DNA. All of the genotypes of ZS_CK InDel_1 in putative hybrid samples showed the same profile as C. kanehirae, different from the C. camphora profile (Fig. 1A,B). However, the marker JLH2_CK InDel_4 is recognized as a nuclear and codominant marker, and its profile in all putative hybrids (F1) showed two clear double bands (152 bp and 227 bp), where one DNA fragment is identical to C. kanehirae (152 bp) and the other DNA fragment is identical to C. camphora (227 bp) (Fig. 1C,D). However, three genotypes were obtained (152 bp, 227 bp, or 152 and 227 bp) in other putative hybrids (F2, B × 0 and B × 1, see Fig. 1E,F). Since the individuals were classified as F2 or backcrosses, which underwent genomic recombination when crossed, the codominant nuclear genes will show a biparental or uniparental genotype. In the present InDel data, we also found that the populations of C. camphora showed different genotypes in the chloroplast InDel (ZS_CK InDel_2) (Fig. 1G). The C. camphora can be clustered into two geographical collection areas: the ESW (east and southwest) and NW (north and west, including Japan and Vietnam) populations. The genotyping of C. kanehirae and ESW C. camphora are identical (124 bp) but different from that of NW C. camphora (212 bp). This is in accordance with other studies, which observed both kinds of chloroplast sequences in C. camphora (NCBI accession numbers: MF421523 and MG021326).
Genetic diversity and SNP variation. The Table 2). P values were based on 9,999 permutations and are significant (P < 0.0001).
Phenotypic analysis, genetic structure and hybrid classification. In the morphological trait analysis, we collected and measured 229, 109 and 109 fruits from C. camphora, C. kanehirae and putative hybrids, respectively. The back-leaf properties of these samples were also recorded and described. All of the putative  Supplementary Table S5 online. www.nature.com/scientificreports/ hybrids have oblate shaped fruits, as do C. kanehirae, and the back-leaf is covered with a powder-like white wax but less noticeably than in C. camphora leaves (see Supplementary Fig. S2 online). The ratios of the fruit length and width (L/W ratio) are available in Supplementary Table S3 online. The results indicated that the L/W ratios are 0.762, 0.867 and 0.992 in C. kanehirae, putative hybrids and C. camphora, respectively. The fruit shape of putative hybrids and C. kanehirae is oblate, while the fruits of C. camphora are almost round. The fruits of C. kanehirae and putative hybrids are similar in shape and larger than those of C. camphora. The differences in the fruit shapes of C. kanehirae and C. camphora have been qualitatively described in previous publications 22,23 . The characteristics of the back-leaf in putative hybrids are intermediate between those of C. kanehirae (glossy and bright) and C. camphora (covered with a powder-like white wax). The individual putative hybrids are seedlings of fruits from the C. kanehirae. Thus, we could infer based on morphological traits that the putative mother trees are C. kanehirae and the putative male ancestor may be related to C. camphora. However, no molecular evidence was used previously to illustrate the relationship of hybridization and the level of introgression of C. camphora and C. kanehirae.
The neighbor-joining tree of 808 SPNs was constructed with 1000 bootstraps (Fig. 2). It clearly shows that C. kanehirae, C. camphora and putative hybrids form three clusters. The bootstrap numbers on the nodes splitting putative hybrids, C. kanehirae and C. camphora were 1 and 0.994, which are high enough to support the clustering. This implies that the putative hybrids are genetically distinct from C. kanehirae and C. camphora. The PCoA analysis of 808 SNPs represented 83.95% (sum of 65.84% + 18.12%) of the total variance (Fig. 3). The ovals around the sample points are the results of the morphological and phylogenetic analysis. C. kanehirae, the putative hybrids and C. camphora were clearly separated into three different groups. This PCoA result is consistent with phylogenetic analysis and illustrates the genetic distance of the hybrids. The C. camphora accessions were also divided into two subgroups, i.e., the NW and ESW populations.
All samples were included in clustering analysis with the STRU CTU RE software, and the number of clusters (K) was estimated to be 2 when delta K statistics were given by the STRU CTU RE Harvester 24 . We examined the genetic structure assuming different Ks between 2 and 4 ( Fig. 4). The results showed that when K = 2, C. kanehirae and C. camphora are clearly separated in two groups (red and green), and all of the putative hybrids were genetic composition admixtures from both species. Fifty pure individuals (19 C. kanehirae and 31 of C. camphora) had Q-values ≥ 0.9, and 25 putative hybrids had Q-values from 0.1 to 0.9. Therefore, we classified them as hybrids. When the cluster number (K) was equal to 3, the cluster including all C. camphora was divided into two clusters (green and blue). When K = 4, the genetic structure only changed in the red cluster (C. kanehirae and putative hybrids). Hybrid individuals with Q-values close to 0.5 were interpreted to be the F1 (C. kanehirae × C. camphora) and F2 (F1 × F1). Hybrid individuals with Q-values close to 0.25 or 0.75 may be interpreted as backcrossed to C. kanehirae or C. camphora (F1 × C. kanehirae or F1 × C. camphora). In the analysis using NewHybrid software, all of the posterior probabilities of C. camphora and C. kanehirae were 1 (Fig. 5). This indicated that all of the C. kanehirae and C. camphora are pure.

Discussion
Within this study, the InDel markers were used to rapidly and successfully detect the hybrids and their parents. InDel markers are codominant and relatively abundant, and they offer easy direct detection by PCR and subsequent gel electrophoresis 25,26 . In most angiosperms, mitochondrial DNA is maternally inherited 27 . The InDel markers in this study are suitable for detecting F1 hybrids but not F2 or backcross hybrids. The InDel data from the mitochondria also revealed that in these hybrids, C. kanehirae is the mother tree, and C. camphora is the donor parent. Taiwan was a possible site for the origin and differentiation of the Cinnamomum sect. camphora 28 . Our data also reveals that the genotypes of east and west C. camphora are different. This is consistent with the previous report that the C. camphora in eastern Taiwan is unique and genetically distinct from the C. camphora in western Taiwan and Japan 29 . These genotyping data are partially in accordance with previous reports. Kameyama et al. used SSR markers to calculate the genetic diversity of C. camphora from China, Japan and Taiwan, and the data showed that the C. camphora in Japan showed a high genetic diversity compared to genotypes from parts of China and Taiwan. However, the report 30 lacked C. camphora individuals collected from ESW Taiwan. The topic of geographic genetic variance in C. camphora seems important for further investigation in the future.
In present study, we generated 529,006 SNP sequences in the SNP calling. However, the proportion of missing sites reached 0.95 in samples of JMC and JLH2, and we removed the SNPs with missing data for high accuracy. The missing data may be relative to mutations in the recognition site of the enzyme, insertion/deletion sequences or sequencing quality. Removing all of the missing data helps to ensure the accurate SNPs for further analysis, but it dramatically reduced the size of the observed GBS dataset 31 . Regarding genetic analysis, expected heterozygosity (He) is also known as gene diversity, which is a measurement for understanding the genetic variation in www.nature.com/scientificreports/ a population. In this study, the He values of C. kanehirae and C. camphora are much lower than those obtained in a previous report 12,30,32 , although all of the previous reports now reveal that the C. kanehirae had a low level of genetic diversity in Taiwan due to the overlogging and destruction or fragmentation of the natural habitat. One of the reasons explaining the lower diversity of C. camphora may be that only the widely distributed giant trees (estimated older than 100 years old) were collected and analyzed in this study. The expected heterozygosity (He) in putative hybrids was higher than those in C. kanehirae and C. camphora. The possible reason for this result is that the genomic recombination occurs in putative hybrids and then increases the genetic diversity. The hybridization may increase the genetic diversity 33,34 . The F is (inbreeding coefficient) is used to gauge the strength of inbreeding. Positive F is values also suggest a significant deficiency of heterozygosity in C. kanehirae and NW C. camphora, where inbreeding leads to a decrease of genetic diversity 12,32 . The F is of putative hybrids was negative, implying a heterozygote excess, which is possibly due to hybridization 35     www.nature.com/scientificreports/ C. camphora, consistent with their being taxonomically different species. A high level of genetic differentiation between putative hybrids and C. kanehirae (0.304) and NW C. camphora (0.267) was found. Hybrids had almost moderate genetic differentiation between C. camphora (0.208-0.267) and C. kanehirae (0.304). This implies that the hybrids are equally close to C. kanehirae and C. camphora (Table 2). Interestingly, the level of genetic differentiation between putative hybrids and ESW C. camphora (0.208) is lower than that between putative hybrids and NW C. camphora (0.267). This implies that the putative hybrids are genetically closer to ESW C. camphora than to NW C. camhora. The two C. camphora groups (NW and ESW C. camphora) also show a moderate level of differentiation from each other but high genetic diversity within C. camphora populations 36 . This may be due to the geographical barrier of the Central Mountain Ridge in Taiwan, and thus, the natural gene flow of C. camphora between east and west may be difficult. Most local plant species exhibit some genetic differentiation because of the Central Mountain Ridge in Taiwan 37 . This result is consistent with the report of Professor Naonori Hirota, in which the eastern camphor trees were found to be different from the those in western Taiwan 29 . In the STRU CTU RE analysis (Fig. 4), when K = 3, the green and blue indicate the samples of ESW C. camphora and NW C. camphora, respectively. This finding of two subpopulations of C. camphora is also in accordance with our InDel and PCoA results. The Central Mountain Ridge in Taiwan may cause reproductive isolation between eastern and western areas in Taiwan. It clearly indicated that the hybrids originated from ESW C. camphora in the region of eastern or southeastern Taiwan because the hybrids have a genetic composition reflecting ESW C. kanehirae and C. camphora admixture. Figure 5 shows that three individuals (TR4, LC5_21 and LC5_26) were classified as F1. This result is consistent with the codominant InDel with double bands in Fig. 1C,F. TR4, the street tree, has the oblate fruit and back-leaf covered with a light powder-like white wax. There are other five putative hybrids (TR1, TR5, TL2, TL3 and TL5) and several C. kanehirae and C. camphora trees planted nearby TR4 (see Supplementary Fig. S3A online). C. kanehirae trees (TR2, TR3, TL1, and TL4) nearby TR4 did not fruit during our investigation period from 2015 to 2019. However, C. kanehirae trees may be in sparse bloom on the top layer of the trees, which we did not observe. C. camphora is mostly pollinated by insects from the order Diptera, which can carry pollen for long distances, at least 2 km, and transport viable pollen over at least 400 m [38][39][40][41] . The closest C. camphora trees are approximately 80 m away from TR4. Therefore, TR(L)i_j (i.e., TR4_1, TR5_2, and TL2_5) have been classified as F1 and backcross (F1 × C. camphora or F1 × C. kanehirae). LC5_21 and LC5_26 were collected from a C. kanehirae tree (LC5) in the nursery, and artificial embryo culture was used to produce the seedlings. Evidence supported the natural hybridization in the individuals collected in the present experiment and was consistent with the surroundings. For instance, in the surroundings of LC5, several C. camphora trees were planted nearby, at approximately 100 m away, and the blooming times of both trees in the collected year were simultaneous. We used artificial embryo culture to produce the siblings LC5_21 and LC5_26, and other siblings (LC5_3, 5 and 8) exhibit similar morphological characteristics to C. kanehirae. The hybrid individuals collected and embryo cultured from the National Museum of Natural Science (NMNS) are F2 and B × 1 or B × 0 because the collected trees there were planted very close to each other in an array along with C. kanehirae (see Supplementary Fig. S3B  online). The C. kanehirae tree (K9R) at NMNS bloomed annually from 2014 to 2018. Thus, it is reasonable that the offspring of hybrids in NMNS are F2 and B × 1 or B × 0 (K13R_3). These data indicated that the hybridization events were spontaneous and random.
The molecular and morphological data of hybrids in this study supported the occurrence of hybridization, which shows that the species boundaries are partly permeable. Both the STRU CTU RE and the NewHybrid analyses showed that these individuals (F1 hybrids, F2 and backcrosses to either C. kanehirae or C. camphora) were morphologically intermediate between C. kanehirae and C. camphora. Our data implied that introgression is bidirectional in these two species, especially in human-modified areas (street trees or nursery). We know that the C. camphora and C. kanehirae rarely display synchronous blossoming in natural fields at lower elevation. Thus, a premating barrier between C. kanehirae and C. camphora is reasonable. We observed the C. kanehirae, C. camphora and hybrids bloomed synchronously in the nursery and among domesticated (street) trees. It could be possible that with the unusual climate for C. kanehirae, the premating barrier became weak, allowing some proportion of these related species to hybridize. Our data are of relevance because the C. kanehirae wood has high market value for use in culturing A. cinnamomea. Although hybrids might be useful in breeding programs and may outcompete the parent species 17,42 , the experimental evaluation of hybrid wood quality is still necessary in the future to ensure the alternative utilization of the hybrids. Therefore, from the conservation point of view, before the evaluation of the effects in the hybrid population, it is better to limit the dispersal of the hybrids in the nursery.

Conclusion
In this study, we provided useful InDels to quickly identify C. kanehirae, C. camphora and their F1 hybrids (C. kanehirae × C. camphora). The genetic analysis with 808 SNPs also supports the occurrence of hybridization between C. kanehirae and C. camphora. For the conservation and breeding of C. kanehirae and C. camphora, further studies are necessary to ensure the properties of hybrids, including the wood quality, adaptive ability and so on. We infer that many hybrid seedlings appear in the market and will extend to natural fields in the future. Hybrid woods may negatively affect the commercial value of pure C. kanehirae wood. Thus, we recommend that buffer zones be built around the seed orchard and nursery of C. kanehirae to avoid cross-species pollination and to preserve the purity of the species. A suitable method for the rapid molecular detection of hybrids is also necessary to preserve and control seedling production in the future.

Material and methods
Plant materials and DNA extraction. The details of the plant samples used in this study are listed in Supplementary Table S4 and Supplementary Fig. S4 online. All of the C. kanehirae leaves were originally collected from natural habitats all over of Taiwan. Twenty-nine C. camphora samples were collected from giant trees (predicted ages are all over 100 years old in order to avoid sampling human-planted C. camphora) in Taiwan, and two C. camphora samples were collected in Japan, while one was collected in Vietnam. Putative hybrids were collected from among street trees near the campus and museum or in the seedling nursery. The samples labeled with underscores were collected as seeds, and then, the plant was obtained by using artificial embryo culture. The method of artificial embryo culture to generate the plant from fruit trees of C. kanehirae and putative hybrids followed our previous publications 43,44 . Samples were labeled as follows: TR1_5 was a seed collected from a TR1 mother tree; K9R_1 and K9R_3 were the seeds collected from a K9R mother tree; TR4_1 and TR4_4 were seeds collected from TR4, and so forth. The DNA extraction of leaf samples followed the CTAB method 45 . All of the samples were collected and immediately stored in a -80 ℃ freezer.
Bioinformatic and genotyping analysis. DNA from the 75 plant samples was prepared for GBS and sequencing library construction following the previous report 5  Genetic analysis and population structure. The genomic reference-based method used the GATK procedure for SNP calling with only read 1 of the sequencer outputs. The accession number of the reference genome we used is GCA_003546025.1 (from scaff001 to scaff0012 with a total of 672,841,857 bp, 92% of the reference genome) in GenBank 19 . TASSEL v5.0 was used for genotyping summary and SNP filtering 47 . For obtaining real and accurate SNP sites for further analysis, we removed the SNPs that contained missing data by using TAS-SEL. The GenAlEx v6.5 was used to calculate observed and expected heterozygosity for each of the nuclear SNP markers. A principal coordinate analysis (PCoA), inbreeding coefficient (F is ) and level of genetic differentiation (F st ) were also calculated by using GenAlEx v6.5 48 . The phylogenetic tree was constructed using MEGA 6.0 by the neighbor-joining (NJ) method with 1,000 bootstrap replicates 49 . The genetic structure was investigated using Bayesian cluster analyses with the STRU CTU RE software v2.3.4 50,51 . An admixture model was employed in which correlated allele frequencies were assumed and the K-value (i.e., the number of clusters) was set from one to seven. The length of the burn-in period was set to 10,000, and the Monte Carlo Markov Chain (MCMC) model after burn-in was run for an additional 100,000 iterations. For each K, 20 replicates were run, and the best delta K was obtained by using STRU CTU RE harvester online software 24 .

Identification and classification of hybrids.
For morphological trait analysis, we examined the color of the back-leaf and the fruit size in some of the C. kanehirae, C. camphora and the putative hybrids. All of the pure species were validated by codominant species-specific InDel markers (JLH2_CK InDel_4). F1, F2 (F1 × F1) and backcross (B × 0: F1 × C. kanehirae; B × 1: F1 × C. camphora) hybrids were identified using the SNP genotypic data with the software STRU CTU RE and NewHybrids 50,52 . The Q-value from STRU CTU RE, which shows the proportion of an individual's genome that originates from each of the K clusters, was used as a hybrid index. Thus, for the best delta K, individuals with Q-values between 0.1 and 0.9 were considered to be hybrids or backcrosses, while individuals with Q-values ≥ 0.9 were considered to be pure species. This coincides with previous studies 2, 17,53,54 . NewHybrids was used to classify individuals as pure parent species, F1s, F2s or backcrosses. NewHybrids computes the posterior probabilities that an individual belongs to these different classes.
For NewHybrids, we used the default genotype categories for first and second generations of crossing and ran 100,000 sweeps of five chains started from over dispersed starting values after a burn-in period of 50,000 sweeps. Jeffrey-type priors were used for the mixing proportions and allele frequencies. To ensure the successful convergence of NewHybrids, the C. kanehirae (B3) and C. camphora (JAH) were set to individual-specific prior (z option) for pure parents.