Introduction

Genetic and fossil evidence supports the ‘Out-of-Africa’ model, in which modern humans evolved in Africa and subsequently migrated through Eurasia to reach the Pacific and Americas.1 The expansion is thought to have begun approximately 45 000–60 000 years ago.2 The signature of a series of colonization events during this expansion persists as a continuous loss of genetic diversity with the increasing distance from the ancestral population known as ‘serial founder effect’.3, 4, 5 When humans encountered new environments, including distinct pathogens, the prevalence of advantageous alleles under the corresponding circumstances could have increased through natural selection. There are several lines of evidence that admixture events occurred between modern humans and archaic hominins.6, 7 Such events along with random genetic drift have influenced geographic patterns of genetic variation in present-day human populations, which may contribute to geographical differences in the susceptibility to infectious diseases and common diseases, such as autoimmune and metabolic diseases.

The human leukocyte antigen (HLA) region is the human equivalent of the major histocompatibility complex, which spans approximately 3.6 mega bases on the short arm of chromosome 6. The HLA region contains many genes involved in immune function. Genetic variation in the HLA region is associated with many diseases, including autoimmune and infectious diseases.8 Recently, several HLA alleles have been reported to be associated with severe and fatal drug hypersensitivity reactions.9, 10, 11 Additionally, the HLA region is characterized as the most polymorphic region in the human genome.8 More than 13 000 HLA alleles have been deposited in the IMGT/HLA Database.12 The frequency distributions of HLA alleles and haplotypes have been used to track human evolutionary processes, such as migration, admixture and selection.13

Here we review characteristics of frequency distribution of HLA alleles and haplotypes in Japanese population. First, we address the property in HLA frequencies of Japanese population compared with other populations, which shows that Japanese population shares a large proportion of HLA alleles and haplotypes with East and South East Asian populations but has some unique properties. Second, we consider genetic differentiation in HLA frequencies across Japanese regional populations, which may provide clues to model for peopling of Japanese Archipelago and to design genetic association studies. Finally, we would like to introduce recent topics that new HLA variants derived from ancient admixtures with Neanderthals and Denisovans played an important role in the adaptation of humans to local pathogens during the Out-of-Africa expansion.

Characteristics of HLA alleles and haplotypes in Japanese population compared with other populations

The polymorphic nature of the HLA region is thought to have been shaped from balancing selection and maintained by overdominance effect, in which heterozygotes are assumed to have greater fitness than homozygotes because of their capability of recognizing a wide range of pathogens.14, 15 This pathogen-driven balancing selection model has been supported by the fact that the heterozygosity of HLA loci in a population is associated with the pathogen richness at the region where the corresponding population resides.16, 17 At the same time, the heterozygosity at each HLA locus was demonstrated to be strongly associated with the distance from East Africa, reinforcing the presence of serial founder effect on HLA diversity (Figure 1).16, 17 The serial founder model assumes that a small part of the population founded colonies nearby before the original population reached demographic equilibrium, which could have created allele frequency difference between the original and migrated populations. As the HLA region is characterized by strong linkage disequilibrium (LD),8 long haplotypes encompassing several HLA loci have been preserved in present-day populations. Therefore, the sharing of HLA haplotypes among populations can be used for inferring human migration routes.13

Figure 1
figure 1

A continuous loss of HLA diversity with the increasing distance from East Africa. (a) HLA-A data from 66 countries, (b) HLA-B data from 64 countries and (c) HLA-C data from 38 countries. We retrieved one population with largest sample size and allele frequency data at four-digit level when there were multiple populations in a country in the AFND.25 y axis corresponds to the expected heterozygosity of each population at the corresponding HLA locus calculated by the following equation: where n is the sample size and pi is the allele frequency for ith allele in the population. x axis corresponds to the geographic distance (km) between the resident of a population and Addis Abeba. The geographic distances were calculated according to Ramachandran et al.3 by using five obligatory waypoints (Cairo, Egypt; Istanbul, Turkey; Anadyr, Russia; Phnom Pehn, Cambodia; and Prince Rupert, Canada).

We will discuss characteristics of frequency distribution of HLA alleles and haplotypes in Japanese population. The frequencies of HLA alleles and haplotypes in Japanese population have been intensively analyzed18, 19, 20, 21 to fulfill the demands of HLA matching between donors and recipients in hematopoietic stem cell transplantation22, 23 and pharmacogenomics studies.24 The Allele Frequency Net Database (AFND) provides the frequencies of HLA alleles and haplotypes from diverse populations.25 In this review, we retrieve the frequency information mainly from the AFND database for the inference.

HLA alleles

Compared with the other HLA loci with many alleles, HLA-A has a unique property in which only a few alleles show predominantly high frequencies in Asian, Native American and European populations.26 In Japanese population, the allele frequency of HLA-A*24:02 reaches about 35%, indicating that approximately 60% of the Japanese people carry this allele.18, 19, 20, 21 A*24:02 distributes all over the world. This allele dominates especially in Taiwan’s aborigines (~86.3%), Oceanian (~74.4%) and Native American populations (~61.4%). The allele frequencies of HLA-A*02:01, A*02:06, A*11:01, A*26:01, A*31:01 and A*33:03 have been reported to be >5% in Japanese populations.18, 19, 20, 21 All of these HLA-A alleles are common in East Asian populations, including Chinese and Korean populations. A*02:06 is common in Asian, Oceanian and Native American populations but of very low frequent in African and European populations. A*26:01 is characterized by the high frequency in Japanese Ainu people who are descendants of indigenous Japanese.27

The frequency distributions of HLA-B, -C, -DRB1 and -DQB1 alleles are relatively even, in which the frequencies of several representative alleles are fairly balanced. B*07:02, B*15:01, B*35:01, B*40:01, B*40:02, B*44:03, B*46:01, B*51:01, B*52:01 and B*54:01 are distributed evenly, with their frequencies ranging from 5% to 10% in Japanese population.18, 19, 20, 21 At HLA-C locus, nine alleles represent frequencies >5% (C*01:02, C*03:03, C*03:04, C*04:01, C*07:02, C*08:01, C*12:02, C*14:02 and C*14:03).18, 19, 20, 21 HLA-DRB1*01:01, DRB1*04:05, DRB1*08:03, DRB1*09:01, DRB1*13:02, DRB1*15:01 and DRB1*15:02 are common in Japanese (5–15%).18, 19, 20, 21, 28 The common DQB1 alleles whose frequencies are >5% are DQB1*03:01, DQB1*03:02, DQB1*03:03, DQB1*04:01, DQB1*05:01, DQB1*06:01, DQB1*06:02 and DQB1*06:04.18, 19, 29 On the other hand, two alleles show predominantly high frequencies at DPB1 locus (DPB1*05:01, 35%; DPB1*02:01, 25%) followed by several common alleles (>5%; DPB1*04:01, DPB1*04:02 and DPB1*09:01).18, 19, 20

B*46:01 and B*54:01 are prevalent only in Asian populations. The frequency of DQB1*04:01 in Japanese is the highest in all the populations in the AFND database, which distributes from Siberia to Oceania.25 C*14:03 and DPB1*09:01 are observed in the highest frequency in Japanese population. C*14:03 is common in Korean population but of low frequent in Chinese and Taiwanese populations. Several ethnic groups in Cameroon, Kurdish people in Georgia and Khandesh people in India commonly harbor C*14:03.

There are many HLA alleles that are common in other populations but rare in Japanese20. For example, the following alleles are common in European populations but rare in Japanese: A*01:01, A*03:01, B*08:01, B*18:01, B*44:02, C*02:02, C*05:01, C*12:03, DRB1*03:01, DRB1*07:01, DRB1*08:01, DRB1*11:04, DRB1*13:01, and DRB1*16:01.

HLA haplotypes

The extent of LD differs according to the pair of HLA loci. Figure 2 shows pairwise LD between HLA-A, -C, -B, -DRB1 and DPB1 in Japanese. The extent of LD was measured by means of normalized global disequilibrium Wn, which was a measure of the total disequilibrium between two loci calculated by summing the deviations contributed by each of the individual haplotypes in a multiallelic system.30, 31 As expected from their close proximity, the alleles of HLA-C and HLA-B are in strong LD. As the recombination hot spots exist within the major histocompatibility complex especially between DRB1 and DPB1,32, 33 the alleles of DPB1 are in weak LD with the alleles of the other loci. Though our data did not include DQB1, the previous report has demonstrated that the alleles of DRB1 and DQB1 exhibited strong LDs.18 According to HLA typing data for many families, the estimated recombination probabilities for A-C, B-DRB1 and A-DRB1 are 0.54, 0.54 and 1.08%, respectively.21

Figure 2
figure 2

Extent of LD for each pair of HLA locus in Japanese population. LD was measured by means of normalized global disequilibrium Wn, which was a measure of the total disequilibrium between two loci calculated by summing the deviations contributed by each of the individual haplotypes in a multiallelic system. HLA genotyping data from mainland Japanese individuals analyzed in our previous study was used.20

The four-locus HLA haplotypes (HLA-A, -C, -B and -DRB1) with the frequency >0.5% in Japanese population are represented in Table 1.20 These haplotypes explained only 32.7% of chromosomes in mainland groups, indicating that most of Japanese individuals harbor short, segmented haplotypes. By scrutinizing the haplotype frequency data from the National Marrow Donor Program (NMDP) for about 3 000 000 subjects, including many Asian groups in the United States,34 a substantial number of the HLA haplotypes that are common in Japanese are shared with Korean, Chinese Vietnamese and Filipino populations (Table 1). This is consistent with the previous report comparing Japanese and Korean populations.35 Among them, only two haplotypes A*26:01-C*03:04-B*40:02-DRB1*09:01 and A*33:03-C*14:03-B*44:03-DRB1*08:03 are virtually absent in the other Asian populations from the NMDP.34 As similar haplotypes containing parts of these two haplotypes are observed in Asian populations; therefore, it seems that these two haplotypes were generated by recombination events and preserved in Japanese population. We will discuss about the spatial distributions of some of these HLA haplotypes by linking with the consideration of differences among Japanese regional populations in the next section.

Table 1 The most common four-locus HLA haplotypes in mainland Japanese

Differences among Japanese regional populations on the basis of HLA alleles

Genetic differentiation across Japanese Archipelago has been analyzed in several gene systems, such as mitochondrial DNA,36, 37 Y chromosome37, 38, 39, 40 and genome-wide single nucleotide polymorphisms.41, 42, 43 Most of these researches were conducted to propose the model for peopling of Japanese Archipelago.

The origin of present-day Japanese has long been debated. It is thought that there were at least two waves of migrations to the Japanese Archipelago. The ancestors of the Jomon people migrated to Japanese Archipelago in the Upper Paleolithic age (approximately 30 000 years ago). The new migrants, the Yayoi people, came through the Korean Peninsula in the Aeneolithic period (1000 BC to 300 AD). The most prevailing model for peopling of Japan is the admixture model or ‘dual structure model’, in which modern Japanese was formed by admixture between the Jomon and Yayoi people.44 The degree of admixture is thought to vary across the archipelago, which may influence genetic structure of the present-day Japanese population.40 Here we introduce the results from researches on genetic differentiation across Japanese regional populations based on HLA alleles and haplotypes.

The phylogenetic studies using the frequencies of HLA alleles demonstrate that Okinawa and Ainu people who are thought to be descendants of indigenous Japanese have close affinity each other, and mainland Japanese is located in the middle of Okinawa and Ainu people and East Asian populations, including Korean population.27, 45 The finding that the haplotype A*24-B*54-DRB1*04:05 that are common in South East Asian populations are prevalent in Okinawa but virtually absent in Ainu people suggests that there was gene flow from South East Asia to Okinawa after differentiation of the ancestors of these two groups.45 Additionally, it has been reported that Nivkh people in Sakhalin contributed to genetic constitution of Ainu people.37, 43 These factors may influence the genetic distance between Okinawa and Ainu people.45 Tokunaga et al.46 pointed out genetic links between Ainu people and Native Americans.

We examined the differences in HLA frequencies across 10 regional populations (Hokkaido, Tohoku, Kanto-Koshin, Hokuriku, Tokai, Kinki, Chugoku, Shikoku, Kyushu and Okinawa).20 Although genetic differentiation between Okinawa and mainland groups was notable, differences across mainland groups were also detected. We identified HLA alleles that were associated with the population structure of Japanese by means of the principal component scores (Figure 3). The identified HLA alleles were classified into four clusters according to the coordinates in the principal component score plot. The alleles of each cluster exhibited similar frequency distributions across regions (Figure 3). Interestingly, the alleles in these clusters reside on the same haplotypes. By examining the extent of LD between pairs of HLA alleles on the haplotypes, we found that the alleles in the two clusters characterized by low frequency in Okinawa exhibited stronger LDs and were preserved as long haplotypes (A*24:02-C*12:02-B*52:01-DRB1*15:02-DPB1*09:01 and A*33:03-C:14:03-B*44:03-DRB1*13:02-DPB1*04:01) (Figure 4). The alleles whose frequencies were high in Okinawa, Shikoku and Kyushu showed intermediate levels of LD and formed haplotype C*01:02-B*54:01-DRB1*04:05-DPB1*05:01 (Figure 4). The LDs between the alleles whose frequencies were highest in Okinawa were much weaker (Figure 4). Interestingly, some of the alleles were frequent (A*02:06, 20.0%; B*35:01, 11.0%) in Ainu people.27 The haplotype A*02:06-B*35:01 was frequent in the Yupik people in Alaska (2.9%).47

Figure 3
figure 3

Frequency distribution of HLA alleles associated with population structure of Japanese population. HLA alleles in the same row are classified into clusters CL1, CL2, CL3 and CL4. The figure is reconstructed from our previous study.20

Figure 4
figure 4

Extent of linkage disequilibrium (D’) between pairs of HLA alleles in the same cluster represented in Figure 3.

The long four-locus HLA haplotypes with >0.5% frequencies make up only 32.7% of chromosomes in mainland Japanese (Table 1), implying that the decay of LD generated short, segmented haplotypes during a long period of isolation of the Japanese population. The haplotypes whose constituent alleles were in strong LD (A*24:02-C*12:02-B*52:01-DRB1*15:02 and A*33:03-C*14:03-B*44:03-DRB1*13:02) were commonly found in Japanese and Korean. A*24-B*52-DR15 that was the serological equivalent encoded by A*24:02-B*52:01-DRB1*15:02 was shared with Mongolians.13 It is plausible that if a haplotype is derived and goes through rapid expansion, its constituent alleles will show strong LD.48, 49, 50 Therefore, it is suggested that these haplotypes had been generated in North East Asia and the Korean Peninsula and then moved into Japan's mainland followed by the rapid expansion probably at the Yayoi period. The haplotype C*01:02-B*54:01-DRB1*04:05 was in the highest frequency in Okinawa and shared by South East Asian populations. The LDs for pairs of the constituent alleles were in the intermediate level (that is, decayed compared with the abovementioned two haplotypes). The property of C*01:02-B*54:01-DRB1*04:05 can be interpreted as the signature of the gene flow from South East Asia in advance of that from the Korean Peninsula. These results may support the admixed model for the peopling of Japan by at least three waves of migration events.

New HLA alleles acquired by admixtures with ancient hominins

The next-generation sequencing technology enabled to determine whole-genome sequences for degraded DNAs from ancient specimens, which settled a long controversy about interbreeding between modern humans and ancient hominins, such as Neanderthals and Denisovans.6, 7, 51, 52 Neanderthals contributed 1.5–2.1% of Eurasian genomes.52 Denisovans-derived DNA was estimated to be 4–6% of New Geneans and Aboriginal Australians but was of no significance in mainland Asia.51 At the same time, strong signatures of gene flows from Neanderthals to Denisovans was detected in the HLA region.52

Abi-Rached et al.53 explored for signatures of archaic admixture in HLA class I genes. The authors reconstructed HLA allotypes for Neanderthals and Denisovan by reanalyzing whole-genome sequence data.6, 7 The Denisova’s allotypes were predicted to be identical or very close to A*02:01/03/07/48, A*11:01/53, B*15:58, B*35:63, C*12:02 and C*15:02/05/17 in modern humans. As the HLA-B allotypes were virtually absent in modern humans, they focused attention on the distributions of all the plausible HLA-A-C haplotypes. These HLA-A-C haplotypes were present in modern Asia and Oceania, rare in Europe but virtually absent in Africa. Among the constituent alleles, A*11:01, C*12:02 and C*15 are prevalent in Asia and Oceania. Therefore, the authors concluded that modern humans acquired these alleles through admixture with Denisovans. Based on the fact that an exceptionally divergent HLA allele, B*73:01, was in strong LD with C*15:05 in present-day humans, the authors speculated that B*73:01 was also derived from Denisovans.

Similarly, the haplotypes containing B*07, B*51, C*07:02 and C*16:02 were predicted to be derived from the admixture with Neanderthals. Additionally, some HLA-A alleles exhibiting notable LD decays (A*11, A*26, A*02:01, A*02:06, A*24:02 and A*31:01) were predicted to be archaic alleles. As a consequence, the estimated contributions of the putative archaic HLA-A alleles reached >50% in Europeans, >70% in Asians and >95% in Papuans. According to the fact that these putative archaic alleles encode strong ligands for natural killer receptors, the authors concluded that the increasing frequencies of these alleles in Eurasia could be interpreted as adaptive introgressions.

Motivated by the fact that the amino-acid sequence motif in HLA-DPβ was shared between DPB1*04:01 and Neanderthal, Temme et al.54 proposed that DPB1*04:01 was the signature of an adaptive introgression from Neanderthal. However, the phylogenetic analysis showed that DPB1*04:01 coalesced with DPB1*04:02 and the other modern human haplotypes before coalescing with Neanderthal haplotype.55 Furthermore, DPB1*04:01 was common in sub-Saharan Africans though its frequency was much less compared with Europeans.55 These suggest that the sharing between DPB1*04:01 and Neanderthal sequence is probably a consequence of long-standing balancing selection or interbreeding before the divergence of modern humans and Neanderthals. Several studies demonstrated that exceptionally divergent haplotypes in immunity-related genes STAT2 and OAS1 were derived from Neanderthals and Denisovans as a consequence of adaptive introgressions.56, 57, 58

The findings suggesting that modern humans acquired immune systems from archaic hominins that had been adapted to local environments are very intriguing. The abovementioned HLA-A alleles are very common in Japanese population. The contributions of all the predicted archaic HLA alleles in Japanese are shown in Table 2, indicating very high archaic contributions to present-day Japanese on HLA variation. The proportions are much higher than those to Eurasians averaged over whole genome (1.5–2.1%).51, 52 Japanese Archipelago are thought to be far from the putative places where the archaic admixture events occurred; therefore, serial founder effect rather than positive selection could have increased the frequencies of these alleles. Considerations about the predicted ancient adaptive introgression is that the frequency of putative Denisovan-derived C*12:02 is about 10% in present-day mainland Japanese; however, HLA-B alleles carried by Denisovan, which are likely to be in LD with HLA-C locus, are virtually absent. Additionally, genome-wide contribution of Denisovan was estimated to be of no significance in Japanese.51 It is possible that some of these alleles had been preserved by long-standing balancing selections.59 More studies on selective advantages of the archaic HLA alleles are needed.

Table 2 Contributions of the predicted archaic HLA alleles in Japanese population

Conclusion

The complex migration events after the Out-of-Africa expansion have shaped spatial gradients of HLA allele frequencies and diversities. The sharing of HLA haplotypes among populations can be used a powerful tool to reconstruct past migration routes. However, it is notable that only a limited proportion of the overall variability in a population can be explained by common long haplotypes. In Japanese, the four-locus haplotypes whose frequencies are <0.5% account for about 70% of chromosomes. The short, segmented haplotypes due to LD decay may provide useful information for tracing back to much older ancestral relationships between populations. Indeed, a segmented haplotype C*03:04-B*40:02 that is common in Japanese (mainland, 6.30%; Okinawa, 8.72%) seems to be one of the genetic footprints of the migration route of prehistoric ancient population from Asia to the New World.13, 20

Knowledge about population structure of Japanese is essential for the identification of genotype–phenotype correlations. A substantial level of population stratification in Japanese population, especially between Okinawa and other mainland groups, suggests a need for careful consideration on design of HLA association studies. The study design, where case–control samples are stratified into two groups (mainland and Okinawa), association studies are independently conducted and then meta-analysis is performed to integrate the results from the two groups, may be useful to avoid false positive findings owing to population stratification.20, 41, 60

The improvement of sequencing technologies61, 62, 63 and the large-scale genotyping projects34 have accelerated the accumulation of HLA frequency data. However, several aspects of HLA frequency data differ according to populations in terms of genotyped genes, resolution and sample size. Current sequencing technologies allow to determine complete sequences of each HLA gene.63 The high-resolution HLA sequencing will deliver a lot more value to several scientific fields, including medical genomics and population genetics.