Introduction

The Japanese Archipelago consists of four major islands (Hokkaido, Honshu, Shikoku and Kyushu) and many other small islands that can be grouped into nine regions (Supplementary Figure 1). Frequent waves of human migrations from the Eurasian continent to the archipelago took place from at least 30 000 years ago (YBP).1 There were various migration routes to the archipelago.2 These migrations have shaped the human population structure in the Japanese Archipelago, where there are currently three main populations: the Ainu who mainly live in Hokkaido at the northernmost island of the Archipelago; the Ryukyuan who mainly live in the Ryukyu Islands at the southern part; and the Mainland Japanese whose population size is the largest and who live in all major four islands and small islands.

From an archeological perspective, the prehistory of the Japanese Archipelago can be divided into the Paleolithic period (older than 16 000 YBP), the Jomon period (16 000–3000 YBP) and the Yayoi period (3000–1700 YBP).1 The currently accepted model regarding the origin of Japanese populations is the dual-structure model,3 whereby the current Japanese population is the result of admixture between the early migrants (Jomon people) and later migrants (Yayoi people) and that the Ainu and the Ryukyuan are thought to retain more Jomon components than the Mainland Japanese. Subsequent studies using mitochondrial DNA and several autosomal markers have been in general agreement with the dual-structure model, showing the admixed nature of Mainland Japanese4, 5 and demonstrating close affinities between the Ainu and Ryukyuan populations.6, 7

The Japanese Archipelago Human Population Genetics Consortium8 produced ~900 000 genome-wide single-nucleotide polymorphism (SNP) data in the Ainu and the Ryukyuans, and through principal component analysis (PCA), and phylogenetic tree construction, demonstrated a clear genetic similarity between these two groups despite their current geographical locations at the opposite poles of the Japanese archipelago. Analysis of individual ancestry proportions and phylogenetic analysis of the Mainland Japanese also show that they carry both Ainu-Ryukyuan and continental Asian genetic components. A recent study that used a model-based approach was also in favor of the dual-structure model.9

Although previous studies generally support the dual-structure model, some details regarding the amount of genetic contributions from the ancestral populations were not really well defined. We also wanted to identify what kind of factors that contributed to the genetic uniqueness that was previously observed in the Ainu.8 Therefore, the aims of this study are to perform a test for admixture and to clarify the timing and admixture proportions in the Japanese populations. In addition, we wish to identify highly diverged genetic loci between the Ainu and Mainland Japanese.

Materials and methods

Sample data

We used 641 314 genome-wide SNP data from the Ainu and Ryukyuans originally published by the Japanese Archipelago Human Population Genetics Consortium8 as well as Mainland Japanese from Nishida et al.10 We merged the data with those from three HapMap populations11 and three Singaporean populations,12 resulting in 431 486 overlapping SNPs. We further merged these data with genome-wide SNP data for 50 randomly sampled Korean individuals from Seoul who were originally used as a control group in a genome-wide association study.13 With the inclusion of the Korean data, the number of overlapping SNPs was reduced to 65 256. The list of populations used in this study is shown in Table 1. All of the SNP data used were already available in the published literature.

Table 1 Populations used for ancestry estimation tests

Data analysis

Given the rather close-knit nature of the Ainu community, we investigated the possibility that closely related individuals may be included in the data set. We estimated measures of kinship coefficients and identity-by-descent between all pairs of Ainu individuals using REAP software,14 which can be used in populations with admixed ancestry. It was shown previously that the Ainu have experienced admixture with Mainland Japanese,8 and we used individual ancestry information at k=2 from ADMIXTURE analysis15 (Supplementary Figure 2) as part of input for the software.

The relationship between Ainu and other populations from various geographical locations of the Japanese archipelago was estimated using Population Structure Prediction System for Japanese (PCAj).16 Based on the probabilistic PCA, our genotype data were used to project individuals onto a scatterplot similar to that shown by Yamaguchi-Kabata et al.17 The Japanese samples included in this software are part of the RIKEN Biobank collection.

To formally test whether Japanese populations are the result of admixture between ancestral Jomon and Yayoi populations, we performed the 3-population test (f3) using the Ainu as surrogates of Jomon ancestors and continental Asians (Han Chinese, Koreans) as surrogates of Yayoi ancestors. We also performed the f4-ratio test to estimate the genetic contributions of source populations in the admixed populations and estimated the time since admixture occurred using rolloff. These three tests are included in the ADMIXTOOLS software package.18

To test whether the Ainu individuals that lie in intermediate positions between Ainu and Mainland Japanese clusters in the PCA plot8 are recently admixed individuals, we first phased haplotypes in 20 Ainu individuals, 20 Mainland Japanese individuals and 8 potentially recently admixed Ainu individuals using FastPhase19 and BEAGLE20 programs. The eight possibly admixed Ainu individuals were phased together with the 20 Ainu and 20 Mainland Japanese separately (Supplementary Figure 3). Pairwise distances between all phased haplotypes were calculated to generate a distance matrix that was used to construct a neighbor-joining tree21 to assess the affinity of haplotypes from potentially admixed Ainu individuals.

To identify genomic regions that are highly differentiated in the Ainu, we calculated pairwise Fst22 between Ainu and Mainland Japanese after omitting potentially recently admixed individuals from each population. We focused on the top 1% of highly differentiated SNPs and performed gene annotation search using GOrilla gene ontology tool23 to find out whether these SNPs have any significant biological functions.

Results

We identified five parent-offspring and two sibling pairs in the Ainu based on the values of kinship coefficient and probability of identity-by-descent=0 (Supplementary Figure 4). We therefore omitted one individual from each of the parent–offspring pairs, and used the remaining 31 individuals for PCA. The result of the new PCA using approximately 65 k SNPs is shown in Figure 1. The first principal component (PC1) separates the Ainu from the rest of East Asian populations, and the population closest to the Ainu are Ryukyuans, consistent with the previous observations. The Ainu individuals are spread out in the same ‘comet-like’ pattern as before,8 but no outlier Ainu individuals were distinguished from PC2. When the Korean data set was omitted, the resulting PCA using about 430k SNPs also showed a similar pattern (Supplementary Figure 5). We therefore surmise that the five outlier Ainu individuals seen in the previous PCA plot8 represent an artifact due the inclusion of very closely related individuals within the Ainu population.

Figure 1
figure 1

Principal component analysis (PCA) plot after omitting closely related Ainu individuals.

The evolutionary factors that created variations explained by PC2 are difficult to conjecture. If we disregard Ryukyuan individuals, then the PC2 axis from top to bottom seems to reflect a south to north geographical cline in populations: Singapore Han Chinese, Beijing Han Chinese (CHB), Koreans and Mainland Japanese of the Japanese Archipelago. However, Ryukyuan individuals are located on the top part on the PC2 axis, above the Mainland Japanese of the Japanese Archipelago. A possible explanation would be some unknown populations were involved in the formation of Ryukyuans, as previously suggested.24

We identified eight Ainu individuals who might be recently admixed with Mainland Japanese based on their intermediate positions between Ainu and Mainland Japanese clusters in the PCA plot (Supplementary Figure 3). If these individuals were the result of very recent admixture events, then one of the pair of chromosomes should be from an Ainu parent and the other from a Mainland Japanese parent. The neighbor joining tree of chromosome 22 haplotypes that were phased using Fastphase and BEAGLE for each of the eight admixed Ainu individuals is shown in Supplementary Figure 6. The haplotype affinities for these potentially recently admixed Ainu did not show a consistent pattern, with three individuals having haplotype affinities with the Ainu and another four individuals with affinities to Mainland Japanese haplotypes. However, in one individual (labeled 2120001B03), one of the haplotypes clustered with other Ainu haplotypes while the other clustered with Mainland Japanese haplotypes, indicating that this person might be a result of recent admixture.

The 31 Ainu individuals together with 35 Ryukyuan and 50 randomly chosen Mainland Japanese from the Kanto region were compared with other individuals from various geographical locations in the Japanese archipelago using probabilistic PCA.16 Our samples were overlaid on a scatterplot that showed major clustering between Mainland Japanese, CHB and Ryukyuans. Our Ryukyuan (green dots) and Mainland Japanese samples from the Kanto region (red dots) fall within well-defined clusters reported by Yamaguchi-Kabata et al.17 as seen in Figure 2. Interestingly, the Ainu individuals (blue dots) form a gradient alongside the cluster of Ryukyuan individuals on the vertical axis. There are several individuals from the RIKEN data set (identified as gray dots) that cluster with our Ainu samples. Although their specific ethnicity was not directly mentioned, they are most likely Ainu individuals based on their geographical origin, which is Hokkaido where most Ainu people currently reside. This identification of Ainu people in the RIKEN data was not reported by Yamaguchi-Kabata et al.17 nor Kumasaka et al.16

Figure 2
figure 2

Scatterplot output from PCAj software. The three populations from Japanese Archipelago Human Population Genetics Consortium8 are shown in colored dots. Other Japanese and Chinese individuals are shown as gray dots.

We tested the dual-structure model for the origin of modern Japanese using the 3-population test (f3 test). We used two data sets for this test: a high density SNP data set without Korean data (~430 k SNPs) and a low density data set with Korean data (~65 k SNPs). Negative values for the f3 test imply that the target population is the result of admixture between two source populations. The Z-score for the test can be taken as a measure of statistical significance. The combination using continental Asians (Han Chinese, Koreans) and Ainu as source populations and the Mainland Japanese and Ryukyuan as target populations showed the most significant results (Table 2). Using the high density SNP data set, the (Mainland Japanese; CHB, Ainu) combination gave the most significant result (f3=−7.0 × 10−3; Z-score=−28.8), whereas in the low density SNP data set, the combination (Mainland Japanese; Korean, Ainu) had the most significant result (f3=−7.1 × 10−3; Z-score=−28.6). Tests using Ryukyuans as target populations also showed similar patterns, but with lower Z-scores compared with Mainland Japanese. These results showed that the Mainland Japanese were the result of admixture between the ancestors of Han Chinese/Koreans and Ainu, who represent the Yayoi and Jomon peoples, respectively. This adds further support to the dual-structure model3 for the origin of Mainland Japanese.

Table 2 Results of 3-population admixture test

To estimate the proportion of genetic contributions from the ancestral populations in the Japanese, we conducted the f4-ratio estimation test. Populations used in this test are assumed to be related to each other according to the tree shown in Figure 3. This test differs from the f3 test in the inclusion of an outgroup, and another population that is close to one of the source populations (continental Asians). Using various combinations of populations, the results with the most significant Z-scores are shown in Table 3. The proportion of the Jomon ancestry (ancestors of Ainu) in Mainland Japanese was estimated to be 17.8% (Z-score 72.3) when using the CHB as the other source population. When using the smaller data set with Koreans as the source population, the proportion of the Jomon ancestry in Mainland Japanese was higher at 17.9% (Z-score 64.9). The proportion of Jomon components in the Ryukyuans was from 28.4% (Z-score 43.8) when using high density SNP data and 27.8% (Z-score 40.0) when using the lower density SNP data. The higher proportion of Jomon component in the Ryukyuans compared with the Mainland Japanese was consistent with the individual ancestry estimates using ADMIXTURE (Supplementary Figure 2).

Figure 3
figure 3

Schematic diagram showing the relationship between populations used for the estimation of Jomon (1−α) and Yayoi (α) proportions in the Mainland Japanese.

Table 3 Proportion of Jomon ancestry (1−α) estimated from the f4-ratio test

We further estimated the time since the admixture event between the Yayoi and Jomon ancestors that resulted in the Japanese populations using the rolloff program.18 Due to the presence of admixed individuals within Ainu, we decided to create three subsets of Ainu individuals with different levels of admixture (Supplementary Figure 7) to gauge the performance of the rolloff program. Because this test is based on the decay of linkage disequilibrium over time, we used the higher density SNP data set (~430 k SNP). Thus, we took CHB and the three Ainu subsets as ancestral populations, and the Mainland Japanese and Ryukyuan as admixed populations. The results in Table 4 show that the Ainu data set with the least number of admixed individuals (Ainu-15) yielded a much older time since admixture (58 generations ago or 1450 years ago, assuming a generation time of 25 years) when using Mainland Japanese as the admixed population. With the inclusion of more admixed individuals in the Ainu data set, the time since admixture became more recent (55 generations ago or 1375 years ago). The estimates using Ryukyuans as the admixed population ranged from 44 to 43 generations ago (1100–1075 years ago), which are relatively more recent compared with those for the Mainland Japanese.

Table 4 Estimation of time since admixture for Mainland Japanese and Ryukyuans, using Han Chinese (CHB) and Ainu as source populations

We also identified SNP loci that are differentiated between the Ainu and Mainland Japanese by using pairwise Fst values. The pairwise Fst values ranged from 0 to 0.8903, with a mean of 0.0407. The majority of the SNPs (approximately 400 000) have Fst values of less than 0.02. We picked 6413 SNPs that were within the top 1% and had Fst values higher than 0.36. The Fst values and annotations for these top 1% SNP are listed in Supplementary Table 1. Within those top 1% of SNP, some of them were found in genes reported to be associated with facial structure in Europeans25 and hair and tooth morphology in East Asians.26, 27 The distribution of Fst values for SNPs in those genes are shown in Figure 4. Two out of five genes for facial morphology (PAX3 and COL17A1) contain highly differentiated SNPs, as with the hair/tooth morphology gene (EDAR). The results of gene annotation analysis on those top 1% SNPs showed enrichment for biological processes and cellular components involving collagen (Supplementary Figure 8).

Figure 4
figure 4

Distribution of differentiated SNP in the Ainu for the five genes associated with facial morphology in Europeans (Liu et al.25) and for hair and tooth morphology (EDAR). The top 1% highly differentiated SNPs have Fst values above 0.36, represented by the red horizontal lines.

Discussion

We identified several very closely related pairs of individuals within the Ainu, which was not reported previously.8 The inclusion of closely related individuals may produce artifactual results for individual-based clustering tests such as PCA or STRUCTURE. Although the PCA plot after omission of closely related Ainu individuals is slightly different from the previous result,8 the overall pattern remains unchanged. The gradient of Ainu individuals along PC1 can be explained partially by recent admixture between Ainu and Mainland Japanese parents and also intermarriage between individuals with different proportions of Ainu and Mainland Japanese ancestry.

We also had the opportunity to compare our Ainu data with those from other Japanese living in different locations in the Japanese Archipelago that was represented in the RIKEN Biobank data.16 The Ryukyuans and Mainland Japanese from Kanto region (Supplementary Figure 1) in our data overlap with the RIKEN data. Ainu individuals still form the gradient, but are positioned adjacent to the cluster of Ryukyuan populations (Figure 2). The three Ainu individuals appearing in the Mainland Japanese cluster were also observed with the RIKEN data. Although there was no major discrepancy between our data and the RIKEN data, the position of the Ainu in Figure 2 is slightly different from Figure 1. This may reflect biases in sample sizes between the two data sets. In the RIKEN data, the majority of the individuals (approximately 1000) are from the Mainland Japanese cluster, with only a few individuals placed within our Ainu cluster.

A clear difference between the northern Tohoku region Mainland Japanese and Ainu is another important finding. Yamaguchi-Kabata et al.17 showed that people living in the Tohoku region are distributed slightly downward in the Mainland Japanese cluster, which overlaps with Kanto region Mainland Japanese (red color) shown in Figure 2. Historically, people in northern Tohoku area were called ‘Emishi’, while the Ainu people were called ‘Ezo’, using the same Chinese characters. It is still not clear whether the Emishi and Ezo people were genetically similar. Some cultural components such as locale names and potteries are shared between the Emishi in Tohoku region and the Ezo in Hokkaido who eventually evolved into the Ainu people.28 However, the PCA shown in Figure 2 suggests that some unknown factor(s) may exist in the formation of Emishi people in the Tohoku region.

Japanese Archipelago Human Population Genetics Consortium8 presented a figure that represents the possible evolutionary history of populations in the Japanese Archipelago. Here we did a formal test using the 3-population, f4 ratio estimation and rolloff tests that has been previously used in other populations.29 In particular, we wanted to test the plausibility of the dual-structure model for the origin of current Japanese using the Ainu and continental Asians as ancestral populations. The results of the 3-population test show that the Mainland Japanese are indeed a result of admixture between the ancestors of the Ainu (Jomon people) and continental Asians (Yayoi People), with an estimated 18% contribution from the Jomon ancestry. Results using the Ryukyuans also indicate that they are admixed, although the Z-scores were less significant when using Mainland Japanese (Tables 2 and 3). The estimated Jomon component was higher in the Ryukyuans, at approximately 28%. This was expected, given the low but significant affinity between the Ainu and the Ryukyuans. Our Jomon ancestry estimates were lower than what was reported previously, which ranged from 30 to 40% in the Mainland Japanese and 50–60% in the Ryukyuans.30, 31 In this study, we used high density autosomal SNP data in the Ainu, whereas previous studies used only Y-chromosome polymorphisms,30 or using non-Ainu as the ancestral population.31 These differences in addition to other methodological parameters may lead to the differences in the estimates.

The time since admixture estimated using rolloff varied from 55 to 58 generations ago, depending on the number of admixed Ainu included in the data set. If we had more Ainu individuals who are less admixed with Mainland Japanese, then the estimate could be even older. Our estimated time since admixture of 1450 years ago may be treated as a lower bound, since rolloff assumes only one admixture event and estimates the most recent event. This time estimate coincides with the period, during 5th–8th century, in which the Japanese government expanded their territory to the northern part (Tohoku) of the Honshu Island.28 Interestingly, the time since the admixture event that led to the formation of Ryukyuans was more recent compared with the Mainland Japanese. The estimated admixture time of approximately 1100 years ago corresponds to the Gusuku period of Okinawan archeological history.32 A plausible explanation may involve the following scenario: the Jomon people who already settled in Ryukyu Islands experienced admixture with migrants from the continent who themselves may have already admixed with the Jomon people in the Japanese Mainland.

We also report several SNP loci that are highly differentiated between the Ainu and the Mainland Japanese. These include two genes associated with facial structure in Europeans.25 SNPs within the EDAR gene have been reported to be highly differentiated between Mainland Japanese and Ryukyuans.17 Similar patterns were observed when comparing between Ainu and Mainland Japanese. However, the non-synonymous SNP rs3827760 reported by Yamaguchi-Kabata et al.17 was not covered on the Affymetrix 6.0 genechip. They also reported a SNP within the ABCC11 gene that was responsible for earwax type was highly differentiated between the Ryukyuan and the Mainland Japanese. However, possibly due to poor coverage of the ABCC11 gene in the Affymetrix genechip, all SNPs in that gene were found to be monomorphic between the Ainu and the Mainland Japanese. It is known that the Ainu have very different physical characteristics compared with Mainland Japanese, and the finding of differentiated loci associated with observable phenotypes agrees with the works done by physical anthropologists over the years.33

To conclude, we supplemented our initial analysis on the Ainu population, and demonstrated that their ancestors contributed significantly to the genetic makeup of the current populations on the Japanese Archipelago. If we consider a very unique genetic status of Jomon people, then it is understandable that Ainu people, who inherited the highest proportion of Jomon DNA, are quite unique among all extant East Eurasians.