Characterising private and shared signatures of positive selection in 37 Asian populations

Liu, Xuanyao; Lu, Dongsheng; Saw, Woei-Yuh; Shaw, Philip J; Wangkumhang, Pongsakorn; Ngamphiw, Chumpol; Fucharoen, Suthat; Lert-itthiporn, Worachart; Chin-inmanu, Kwanrutai; Chau, Tran Nguyen Bich; Anders, Katie; Kasturiratne, Anuradhani; de Silva, H Janaka; Katsuya, Tomohiro; Kimura, Ryosuke; Nabika, Toru; Ohkubo, Takayoshi; Tabara, Yasuharu; Takeuchi, Fumihiko; Yamamoto, Ken; Yokota, Mitsuhiro; Mamatyusupu, Dolikun; Yang, Wenjun; Chung, Yeun-Jun; Jin, Li; Hoh, Boon-Peng; Wickremasinghe, Ananda R; Ong, RickTwee-Hee; Khor, Chiea-Chuen; Dunstan, Sarah J; Simmons, Cameron; Tongsima, Sissades; Suriyaphol, Prapat; Kato, Norihiro; Xu, Shuhua; Teo, Yik-Ying

doi:10.1038/ejhg.2016.181

Download PDF

Article
Published: 18 January 2017

Characterising private and shared signatures of positive selection in 37 Asian populations

Xuanyao Liu^1,2,
Dongsheng Lu ORCID: orcid.org/0000-0002-8348-5159³,
Woei-Yuh Saw^2,4,
Philip J Shaw⁵,
Pongsakorn Wangkumhang⁵,
Chumpol Ngamphiw ORCID: orcid.org/0000-0002-8581-397X⁵,
Suthat Fucharoen⁶,
Worachart Lert-itthiporn^7,8,
Kwanrutai Chin-inmanu⁸,
Tran Nguyen Bich Chau⁹,
Katie Anders^9,10,
Anuradhani Kasturiratne¹¹,
H Janaka de Silva¹²,
Tomohiro Katsuya¹³,
Ryosuke Kimura¹⁴,
Toru Nabika¹⁵,
Takayoshi Ohkubo¹⁶,
Yasuharu Tabara¹⁷,
Fumihiko Takeuchi ORCID: orcid.org/0000-0003-3185-5661¹⁸,
Ken Yamamoto ORCID: orcid.org/0000-0002-1531-9685¹⁹,
Mitsuhiro Yokota²⁰,
Dolikun Mamatyusupu²¹,
Wenjun Yang²²,
Yeun-Jun Chung²³,
Li Jin²⁴,
Boon-Peng Hoh²⁵,
Ananda R Wickremasinghe¹¹,
RickTwee-Hee Ong²,
Chiea-Chuen Khor²⁶,
Sarah J Dunstan^9,10,27,
Cameron Simmons^9,10,28,
Sissades Tongsima ORCID: orcid.org/0000-0002-1491-1839⁵,
Prapat Suriyaphol^8,29,
Norihiro Kato¹⁸,
Shuhua Xu^3,30,31 &
…
Yik-Ying Teo^{1,2,4,18,26,32}

European Journal of Human Genetics volume 25, pages 499–508 (2017)Cite this article

2597 Accesses
21 Citations
3 Altmetric
Metrics details

Subjects

Genetic variation

Abstract

The Asian Diversity Project (ADP) assembled 37 cosmopolitan and ethnic minority populations in Asia that have been densely genotyped across over half a million markers to study patterns of genetic diversity and positive natural selection. We performed population structure analyses of the ADP populations and divided these populations into four major groups based on their genographic information. By applying a highly sensitive algorithm haploPS to locate genomic signatures of positive selection, 140 distinct genomic regions exhibiting evidence of positive selection in at least one population were identified. We examined the extent of signal sharing for regions that were selected in multiple populations and observed that populations clustered in a similar fashion to that of how the ancestry clades were phylogenetically defined. In particular, populations predominantly located in South Asia underwent considerably different adaptation as compared with populations from the other geographical regions. Signatures of positive selection present in multiple geographical regions were predicted to be older and have emerged prior to the separation of the populations in the different regions. In contrast, selection signals present in a single population group tended to be of lower frequencies and thus can be attributed to recent evolutionary events.

The genomic landscape of Mexican Indigenous populations brings insights into the peopling of the Americas

Article Open access 12 October 2021

Humberto García-Ortiz, Francisco Barajas-Olmos, … Lorena Orozco

North Asian population relationships in a global context

Article Open access 04 May 2022

Kenneth K. Kidd, Baigalmaa Evsanaa, … Andrew J. Pakstis

Genetic history from the Middle Neolithic to present on the Mediterranean island of Sardinia

Article Open access 24 February 2020

Joseph H. Marcus, Cosimo Posth, … John Novembre

Introduction

The HUGO Pan-Asian SNP Consortium (PASNP) was a successful model of a genomics research network in Asia, which surveyed the genetic make-up of 1928 participants from 73 populations and permitted an unprecedented insight into possible human migration across the Asian continent.¹ However, owing to the small number of participants in each population and the use of a low-density genotyping microarray, analyses of the PASNP data were mostly limited to broad analyses of population structure between the populations, and less is known about the cross-population microstructures such as admixture, linkage disequilibrium variation, as well as within-population adaptive forces such as natural selection and population bottlenecks. In fact, the sparse SNP density confounded several attempts to investigate the role of natural selection in driving genetic diversity between these populations, regardless of whether these relied on comparing allele frequencies² or locating long haplotypes.³

As the largest continent on Earth spanning over 93.5° in latitude and 196.3° in longitude, Asia possesses tremendous diversity in climate and altitude. This geographical variation, coupled with the myriad cultures and languages in Asia, has exerted profound influences on the genetics of Asian populations, allowing these populations to adapt differentially to high altitude,^{4, 5, 6, 7, 8} diet⁹ and infectious diseases.^{3, 10, 11} Given the vast expanse of Asia, populations that are geographically closer are expected to have adapted in a similar fashion, given the likelihood of more similar microclimates and dietary practices, whereas the selection on populations further apart may substantially differ. However, to date no collection of genetic data spanning multiple Asian populations exits at sufficient genomic coverage to allow a comprehensive and systematic evaluation of local adaptation in Asia.^{2, 12, 13, 14}

The Asian Diversity Project (ADP) overcomes this paucity in data by enlisting scientists across Asia to cooperate and share resources for a deeper survey of population genetics in Asia. To date, ADP has assembled genetic data for 37 cosmopolitan and ethnic minority populations from public databases^{15, 16, 17, 18, 19} and private repositories (Figure 1a), where there are at least half a million SNP markers genotyped in each population and 33 of the 37 populations surveyed at least 10 individuals (Supplementary Table S1).

Here, we aim to characterise the extent and sharing of positive selection in the 37 ADP populations. To do this, we first investigated the degree of genetic similarity between these populations with a comprehensive suite of population structure analyses, which supported the clustering of the Asian populations into four major population groups that corresponded broadly with the geographical distributions of the populations and present-day definitions of Asian populations. The genotype data for each population were subsequently phased with BEAGLE,²⁰ and the haplotype-based methods haploPS¹⁴ were employed on the phased data to locate genomic signatures of positive selection. The regions identified to be under selection were summarised according to the degree of sharing across populations in the four groups. We believe this is the first survey of positive selection in Asia that possesses sufficient statistical power to allow a systematic evaluation of the extent of sharing signatures of local adaptation. The genotype data have been deposited at the European Genome-phenome Archive (EGA, https://ega-archive.org) under the accession number EGAS00001002100.

Methods

Sample collection and genotyping in ADP

The ADP has integrated 37 Asian populations from publicly available databases such as (i) the four populations with Asian ancestries in Phase 3 of the International HapMap Project¹⁹ (HapMap: CHB, CHD, GIH, JPT); (ii) three populations from the Singapore Genome Variation Project¹⁶ (SGVP: CHS, INS, MAS); and four North Eurasian populations (Kurd, Buryat, Kyrgystani, Mongolian) from the Jorde Lab^{17, 18} (http://jorde-lab.genetics.utah.edu/?page_id=23). DNA samples of Tajik, Uyghur, Kazakh and Kirgiz were collected from the Xinjiang Uyghur Autonomous Region in China, whereas DNA samples of Hui (30) were collected from the Ningxia Hui Autonomous Region in China. All procedures were followed in accordance with the ethical standards of the ethics committee of Fudan University and the Helsinki Declaration of 1975, as revised in 2000. Data and information of the other ethnic populations from China (Dong, Han, Li, Tibetan, Yao, Zhuang) were described elsewhere as indicated.^{21, 22} The samples from eight Japanese prefectures (Amagasaki, Ehime, Fukuoka, Kita-nagoya, Ohasama, Okinawa,²³ Shimane, Tokyo) were coordinated through the National Center for Global Health and Medicine (NCGM) in Tokyo on behalf of the Japanese SNP Consortium. The NCGM was also responsible for coordinating the Sri Lankan data sets that included Sri Lanka Tamils, Indian Tamils and Sinhalese. A genetic data set of 200 Kinh Vietnamese was made available from the Oxford University Clinical Research Unit in Ho Chi Minh City, Vietnam, whereas two independent Thai data sets were contributed, respectively, by the Siriraj Hospital, Mahidol University, and Genome Technology Research Unit, the National Center for Genetic Engineering and Biotechnology (BIOTEC).

The samples from the HapMap and SGVP were genotyped on both the Affymetrix Genome-wide Human SNP array 6.0 and the Illumina Human1M BeadChip. Genotyping of the Chinese and Korean samples was performed only on the Affymetrix 6.0 array. The Japanese samples were genotyped on different platforms, with Ohasama on Affymetrix SNP 5.0, Tokyo and Shimane on Illumina 550k, Okinawa on the Illumina Omni-Express²³ and the rest of the populations on Illumina 2.5M. The three Sri Lankan populations were similarly genotyped on the Illumina 2.5M. Both sets of Thai samples were genotyped on the Illumina 610k, whereas the Vietnamese Kinhs were genotyped on the Illumina 660W. All coordinates reported here have been referenced against Build 37 (hg19).

Data sets from the contributing sites have undergone quality checks, which have excluded problematic samples and SNPs (Supplementary Table S2). Specific details of the QC process can be found from the respective publications. However, for the purpose of our intended analysis, we reiterate here that this QC process also excluded samples with a high level of identity-by-state (IBS) genotypes, where only the sample with the higher call rate was retained in each pair of individuals with excessive IBS.

Details of the 37 ADP populations, including the sample size in each population, can be found in Supplementary Table S1.

Data availability

The post-QC genotype data of the 37 populations in the ADP are available upon request to statyy@nus.edu.sg. As the ADP includes data sets from public resources, only the post-QC data will be shared for these populations and requests for the raw data set will be directed to the public resources. The raw genotype data for all other populations will however be available.

Population structure analyses

The analyses of the population structure for the 37 ADP populations were performed with four different strategies: (i) calculating the Weir and Hill F_ST between every pair of populations, defined as

with n_i the sample size of the ith population, n a constant determined by the number of samples and populations, and and “http://www.w3.org/1998/Math/ denote the average frequency of the allele in population i and both populations, respectively; (ii) a nearest-neighbour phylogenetic tree constructed with a genetic distance matrix where the (i, j)th element of the matrix is represented by the F_ST for population i and population j; (iii) a series of principal component analyses (PCAs) with 30 927 SNPs that are present in all 37 ADP populations as well as CEU and YRI from HapMap3 using smartPCA in the EIGENSOFT package;²⁴ and (iv) a series of admixture analyses with ADMIXTURE version 1.22²⁵ with the same 30 927 SNPs, where the number of ancestral populations (K) was allowed to vary from 2 to 12.

Haplotype phasing

The genotype data for the 33 ADP populations with at least 10 individuals were phased with BEAGLE version 3.3.2²⁶ to obtain the haplotype data necessary for the analysis of positive selection. Each population was phased independently of the others although internal validation indicated there were no significant differences in the phased haplotypes even if multiple populations were phased simultaneously (results not shown).

Locating genomic signatures of positive selection

The search for genomic signatures of positive selection was performed with the haplotype-based method haploPS,¹⁴ which was applied to the phased haplotype data from each of the 33 ADP populations and effectively relied on evidence of uncharacteristically long haplotypes when adjusted for background recombination.

HaploPS searches for the longest possible haplotype form that is present in the population with at least a pre-defined core haplotype frequency. At each of these frequencies, the identified haplotype form is measured by both the genetic distance spanned and the number of SNPs from the data that is present on the haplotype, where an adjusted haploPS score is subsequently derived from these two metrics after scaling against the number of such haplotypes present across the whole genome for the core frequency. In this study, to reduce the possitibility that false-positive signals of natural selection might confound the clustering analysis of the populations based on selection signals, we applied a stringent threshold for identifying positive selection regions by haploPS, where a genomic region is only identified to present evidence of positive selection if the adjusted haploPS score is <0.01 (see Discussion for sensitivity and specificity comparisons at this threshold). We conducted the search across a range of core haplotype frequencies from 5 to 95% in step-sizes of 5%. For a region identified by haploPS to be positively selected, combining the output across the range of core haplotype frequencies additionally allowed the putative frequency of the positively selected allele to be estimated. For a genomic region that is found to be positively selected in multiple populations, the individual selected haplotype forms can also be used to evaluate whether the emergence of the advantageous allele predates the separation of these populations, or is likely to have emerged independently in the different populations, also known as convergent evolution. The haploPS analysis used the software that is publicly available at http://www.statgen.nus.edu.sg/~haplops/.

HaploPS was chosen as it conferred higher statistical power and lower type I error than conventional methods for detecting positive selection.¹⁴ An equivalent analysis with two additional methods for detecting selection (iHS [12], XP-EHH [13]), along with a comparison of their results with those from haploPS can be found in the Supplementary Material.

Haplotype similarity index

Haplotype similarity index (HSI)¹⁴ is a metric to infer whether the selection signals shared by multiple populations originated from the same or different mutation events by quantifying the degree of similarity between the extended haplotype forms corresponding to a selection signal identified by haploPS that is concurrently present in multiple populations. A HSI larger or equal to 0.98 indicates high level of similarity and that the haplotype forms are likely to have descended from the same founder haplotype and thus indicative of a single-origin mutation event. HSI smaller than 0.9 indicates different haplotype backgrounds exist due to convergent evolution. Details regarding the formulation and evaluation of the HSI and associated thresholds can be found in Liu et al.¹⁴

Results

Population structure analyses

A total of 3933 samples from 37 Asian populations remained after quality control, of which we identified a set of 30 927 SNPs distributed across the genome that were present in all 37 populations. This data set is used for the analysis of population structure, together with the HapMap2 Europeans (CEU) and Africans (YRI) as outgroups. Using the F_ST values obtained from all possible pairwise comparisons of the 39 populations as inputs for a genetic distance matrix, we constructed a phylogenetic tree where YRI clearly segregated from the rest of the non-African populations, followed by CEU (Figure 1b). Among the Asian populations, the Kurdish and the Tajiks were genetically close to CEU (F_ST=1.4% and 1.9%, respectively), and all three populations were grouped together with South Asian populations (GIH, INS, Sinhalese, Indian Tamil, Sri Lankan Tamil) in the same clade in the phylogenetic tree. We thus defined this collection of populations as a population group corresponding to the South region. Central Asian samples such as the Kyrgyzstani, Buryat and Mongolians were grouped together with Chinese ethnic minorities that are located in the same geographical region (Uyghur, Kazak, Kirgiz, Tibetan and Hui) in the Central region. The classically defined East Asian samples from all eight Japanese populations, Korea and the Han Chinese samples (CHB, CHD, CHS, Han) were grouped into the East region, whereas the Southeast Asian populations from Thailand, Singapore, Vietnam and the ethnic minority populations of Dong, Li, Yao and Zhuang are grouped as the Southeast region. However, it should be highlighted that the distinction between these regional groupings, in particular between East and Southeast regions, are not as pronounced as those typically observed when comparing between globally distributed ancestry groups, due to the complex nature of migration and admixture between Asian populations (see below and Supplementary Material).

A PCA performed with the same 30 927 SNPs yielded findings that concurred with the phylogenetic analyses, separating the YRI and CEU from the rest of the ADP populations (Supplementary Figure S1A). The first two principal components from the PCA with only the 3933 ADP samples exhibited remarkable correlation with geographical coordinates with the latitude and longitude of the populations (Figure 1b, Supplementary Figure S1, Pearson r_{longitude-PC1}=0.79, P-value <4.9 × 10⁻³²⁴; r_latitude-PC1=0.58, P-value <4.9 × 10⁻³²⁴; Pearson r_{longitude−PC2}=0.40, P-value=9.9 × 10⁻¹⁶⁶; r_{latitude−PC2}=0.49, P-value=32 × 10⁻²⁷²). Owing to the ambiguity between the East Asian and Southeast Asian clades, we performed a separate admixture and PCA with the samples from Thailand, Vietnam, Singapore Malays and the Han Chinese populations (CHB, CHD, CHS). This indicated there is a considerable extent of genetic diversity in Thailand (Supplementary Figure S2), where the Thai people have varying degree of Han Chinese admixture ranging from 10 to 80%²⁷ (Figure 2 and Supplementary Figure S2), with some that are virtually indistinguishable from the Malays despite the Malays and the Han Chinese being genetically distinct (Figure 1b,Supplementary Figure S1). We also observed that samples from Okinawa were genetically distinct from the other seven mainland Japanese groups^{23, 28} (Figures 1b and 2, Supplementary Figure S1), consistent with the observation from the pairwise F_ST comparison (Supplementary Figure S3) where the minimum F_ST between Okinawa and the seven Japanese populations was 0.55%, whereas the maximum F_ST between any pairs of the seven Japanese populations was only 0.04%.

Detecting genomic signatures of positive selection

To investigate whether local adaptation may have driven the genetic diversity between Asian populations, we selected 33 populations from ADP with more than 10 samples each and analyse each population for evidence of positive selection with haploPS as it exhibits higher statistical power at locating genomic signatures of positive selection across different derived allele frequencies, while possessing lower false-positive rates across the majority of the allelic spectrum, compared with iHS and XP-EHH¹⁴ (see Discussion and Supplementary Material). Our analysis with haploPS identified 693 signals in total from the 33 populations, which were the basis for all subsequent analyses.

Quantifying the distribution of selection signals in Asia

We observed that the 693 selection signals (Supplementary Table S3) were located in 140 distinct genomic regions (Supplementary Table S4), where the majority of the signals were actually present in multiple populations but located in the same genomic regions. We thus investigated the pattern of sharing of the 140 selected regions among the 33 populations by calculating a similarity matrix for the 33 populations, where each entry indicates the number of regions out of 140 that are positively selected in the respective population pair. When a PCA was performed on this similarity matrix, the first two leading principal components indicated that population groups that were geographically closer were more likely to share the same selection signals than population groups located further apart (Figure 3). We observed that populations in the South and Central regions can similarly be grouped separately by the extent of sharing of selection signals, whereas there were considerable overlaps between the populations in the East and Southeast regions.

Subsequently, we investigated whether sharing of selection signals was more likely between specific population groups. Each of the 140 regions was categorised according to whether it was uniquely present in one population group, or whether it was present in specific combinations of two or more population groups (Figure 4a). Of the 140 regions, 76 were only found in populations in one unique population group (Supplementary Table S4). We observed that the greatest number of clade-specific signals was present in the South region that predominantly comprised of South Asian populations. This concurred with the previous observations that the South Asian populations are most distinct from the rest of the Asian populations. The least number of clade-specific signals was in the Southeast region, comprising predominantly of Southeast Asian populations. It is likely due to the greater extent of admixture observed between the populations in East and Southeast regions. For signals that were present in populations from the three population groups, the majority were located outside the South region, in agreement with earlier evidence that South Asian populations are considerably more distinct from the other Asian populations we have considered.

There were also five regions that were positively selected in all the four population groups, which included two regions in the major histocompatibility complex on chromosome 6 that were similarly identified to be selected in CEU. The remaining three regions were only selected in our Asian populations, and none of the five regions were selected in YRI, highlighting the rarity for evolutionary signals to be shared between Africans and populations from all the four Asian groups.

When haploPS identifies a selection signal, it also estimates the frequency of the (unknown) advantageous allele that is driving the selection. By grouping the selected signals according to the estimated frequencies into three categories (low: <30%; medium: 30–80%; and high: ≥80%), we observed that the majority of the signals that were present in all four ancestry clades were high frequency in nature (Figure 4b).

Sharing of selection signals: single-origin selection and convergent evolution across Asian populations

One feature of haploPS is the identification of the selected haplotype form, on which the beneficial allele is expected to reside. For a genomic region that is positively selected in multiple populations, haploPS compares the selected haplotype forms to deduce whether the shared selection signals were likely to have emerged from the same evolutionary event before the populations diverged. Should the selected haplotype forms be considerably different, then this suggests that separate mutation events have occurred independently on different haplotype backgrounds in different populations, a phenomenon defined as convergent evolution. We utilised the HSI to infer whether the selected haplotype forms across multiple populations were likely to have descended from the same founder haplotype (defined as HSI≥0.98) and thus indicative of a single-origin mutation event, or whether fundamentally different haplotype backgrounds exist (HSI<0.90) due to convergent evolution.

Of the 140 regions, 53 were positively selected in at least two populations from any of the four ancestry clades and have more than 20 SNPs in the overlapping regions. The majority (47/53) were inferred to have stemmed from the same evolutionary events given the HSIs for these regions were at least 0.98 (Supplementary Table S5). These included the classical signals: (i) EDAR on chromosome 2, which was positively selected in 11 ADP populations, including the majority of Japanese and Han Chinese populations, and which exhibited perfectly concordant haplotype forms (HSI=1.00) across all 11 populations (Figure 5a); and (ii) the skin pigmentation gene KITLG on chromosome 12, which was selected in 11 populations, and which similarly exhibited perfectly concordant haplotype forms (HSI=1.00, Figure 5b). The KEL gene that encodes the KELL blood group antigens was also observed to be selected in 12 populations from East Asian, Central Asian and Southeast Asian, likely stemming from the same evolutionary event as well given a HSI of 1.00 (Supplementary Figure S4).

Of the 53 regions that were positively selected in at least two populations, six regions demonstrated evidence of convergent evolution (Table 1), including a region spanning between 43.4Mb and 44.1Mb on chromosome 2 that encapsulated genes reported to be associated with lipid metabolism (THADA, ABCG5, ABCG8) and were positively selected in Japanese (Amagasaki, Ehime, Nagoya, Fukuoka) and Southeast Asians (Thailand, Vietnam).

Table 1 Genomic regions identified by haploPS to be under convergent evolution in specific populations in the Asian Diversity Project

Full size table

Evidence of localised adaptation and ancestry-specific selection

In addition to the genomic regions that were positively selected in multiple ancestry clusters, there were also 76 regions that were uniquely selected in just one ancestry cluster in the ADP. ABCC11 was identified to be positively selected in CHB and JPT. This gene carries a nonsynonymous SNP (hg19 chr16:g.48258198C>T, rs17822931) where homozygous carriers of the thymine allele possess dried earwax²⁹ and are more likely to be neutrally nonodorous,³⁰ which has characteristically been found in East Asian populations. The thymine allele at rs17822931 was found on the selected haplotype form that was perfectly identical in the two populations (HSI=1.00, Figure 5c).

Among the selection signals uniquely present in East Asian populations, the region containing the alcohol dehydrogenase (ADH) genes on chromosome 4 was selected in all the Japanese populations (JPT, Amagasaki, Ehime, Shimane, Nagoya, Okinawa, Fukuoka, Tokyo) and CHD. These ADH genes have been shown previously to protect against alcoholism,³¹ and we observed that the selected haplotype forms from these populations were perfectly identical (HSI=1.00).

Discussion

This is the first Pan-Asian population genetics survey of 37 cosmopolitan, ethnic minority and indigenous groups in Asia, where at least half a million variants across the human genome have been genotyped. By defining the ADP populations into four population groups, we were able to evaluate how many of the identified genomic regions were positively selected in more than one population groups. Two lines of evidence have also emerged from our analyses to suggest that regions under positive selection in multiple ancestry clades were likely to be older and have emerged prior to the separation of the ancestry clades: (i) the advantageous alleles were estimated to be at higher frequencies; and (ii) the haplotype similarity indices indicated the background haplotype carrying the advantageous alleles were highly concordant even between populations of different ancestry clades. Conversely, for regions that were positively selected uniquely in one ancestry clade, the advantageous alleles were estimated to be at lower frequencies, thus highly suggestive that the selection is very recent and confined to specific populations in the ancestry clade.

Such an unprecedented characterisation of positive selection in Asian populations relied on the confluence of three elements present in this study: (i) the deeper coverage of the genome provided by high-density genotyping arrays; (ii) 33 of the 37 populations surveyed at least 10 individuals each, thus providing greater confidence in estimating haplotype frequencies; and (iii) the availability of a highly sensitive algorithm (haploPS) to locate signatures of positive selection across almost the entire allele frequency spectrum. In addition, the ability to extract the specific haplotype form carrying the advantageous allele allows a direct comparison to be made between different populations that are similarly selected in the same genomic region, thus conferring the opportunity to determine whether the advantageous allele has emerged in a shared evolutionary event or in independent events due to convergent evolution. We highlight the contrast of the current findings with those from a previous analysis of the PASNP data, where only 59 regions were identified to be positively selected across 79 Asian populations due to the sparse data with only 50 000 markers.³ Here, we performed two separate analyses that leveraged on the density of the data: (i) the analysis of population structure relied on a common set of 30 927 SNPs across all the ADP populations; whereas (ii) the analysis of positive selection benefitted from the full set of SNP markers ranging from at least 500 000 SNPs up to 2.5 million in each population. The significantly higher SNP density has previously been shown to improve the detection of positive selection and to reduce false discoveries due to more accurate phasing.³

The clustering of the ADP populations according to the PCA of the positive selection signals yielded a pattern that is remarkably similar to that obtained by the standard PCA of genome-wide SNP data. We have attributed this to the sharing and differentiation of positive selection signals between the populations in the four ancestry clades. However, an alternative explanation is that haploPS may possess a high false-positive rate that resulted in neutrally evolving loci accounting for the majority of the discovered sites under selection. This inevitably meant the discovered regions were but a random sampling of genome-wide data, thus reproducing what has been seen in the conventional PCA of genome-wide SNP data.

To investigate the extent of false positives produced by haploPS (as compared with well-known methods such as iHS, XP-EHH), we performed a series of simulations producing demographies representative of neutral loci under three varying degrees of bottlenecks in Asian populations, and examined how often each of the three methods erroneously report evidence of positive selection. These relied on the same thresholds used to compare the statistical power of the three methods for successfully discovering sites under selection.¹⁴ We observed that haploPS reported significantly lower false-positive rates across all three scenarios (Supplementary Table S6). In particular, at stronger bottlenecks corresponding to inbreeding coefficients of 0.2 and 0.3, haploPS reported false-positive rates of 5.8% and 13.3%, respectively, compared with those by iHS (17.2% and 55.3%, respectively) and XP-EHH (4.4% and 8.7%, respectively). Notably, the investigation into the sharing and differentiation of positive selection signals was only made possible with the advent of haploPS,^{3, 14, 32, 33} given the higher statistical power with accompanying lower type I errors.

Our study also presented multiple lines of evidence to suggest that South Asian populations were considerably different from the rest of the Asian populations, which concurred with previous findings that South Asia is considerably more diverse than other Asian regions.^{32, 34, 35} This is contrary to the observations between populations in the other three ancestry clades, where the admixture analyses and PCA indicated there is a varying degree of mixing between these clades. The distinction between East and Southeast Asian populations was particularly ambiguous, in part due to Southeast Asian populations such as the Vietnamese Kinh and the Thais that were genetically close to the Han Chinese from East Asia.

Our survey of positive selection in the ADP populations has essentially relied on the detection of classic selective sweeps that produce distinctive imprints of long haplotypes in the genome. Hernandez et al³⁶ reported that such ‘hard’ selective sweeps are likely to be rare in contrast to polygenic selection or selection on standing variation, although we emphasise that our identification of 140 selected regions across 33 Asian populations is in line with their estimate that <10% of nonsynonymous variants are likely to be advantageous. Furthermore, we were able to consistently identify well-characterised signatures of positive selection, such as at EDAR, which has been functionally established to influence physical traits predominantly found in East Asians,^{37, 38, 39} which is a valuable reassurance providing proof-of-concept for our findings.

The ADP is the second genomics research network to be developed in Asia following the PASNP, and the present coverage of Asia by the ADP is noticeably incomplete. Numerous populations in Asia are rapidly shrinking in size, where an entire community can possess <50 individuals found within a few consanguineous households. Global efforts studying the population genetics of early human populations have focused predominantly on African communities, and surveys of Asian populations, particularly the indigenous groups in many parts of Southeast Asia, remained sparse and fragmented. Given the significant variations of climate, culture and environment in Asia, the population genetics of Asian populations remained surprisingly unexplored. Valuable insights into human migration and possible links between Asian populations and distant hominin cousins such as the Neanderthals and Denisovans are starting to emerge,^{40, 41, 42, 43, 44} and this may present the much needed motivation to coordinate efforts to study the population genetics of Asia.

References

Abdulla MA, Ahmed I, Assawamakin A et al: Mapping human genetic diversity in Asia. Science 2009; 326: 1541–1545.
Article CAS Google Scholar
Qian W, Deng L, Lu D, Xu S : Genome-wide landscapes of human local adaptation in Asia. PLoS One 2013; 8: e54224.
Article CAS Google Scholar
Liu X, Saw WY, Ali M, Ong RT, Teo YY : Evaluating the possibility of detecting evidence of positive selection across Asia with sparse genotype data from the HUGO Pan-Asian SNP Consortium. BMC Genomics 2014; 15: 332.
Article Google Scholar
Qi X, Cui C, Peng Y et al: Genetic evidence of paleolithic colonization and neolithic expansion of modern humans on the Tibetan plateau. Mol Biol Evol 2013; 30: 1761–1778.
Article CAS Google Scholar
Hanaoka M, Droma Y, Basnyat B et al: Genetic variants in EPAS1 contribute to adaptation to high-altitude hypoxia in Sherpas. PLoS One 2012; 7: e50566.
Article CAS Google Scholar
Simonson TS, McClain DA, Jorde LB, Prchal JT : Genetic determinants of Tibetan high-altitude adaptation. Hum Genet 2012; 131: 527–533.
Article Google Scholar
Peng Y, Yang Z, Zhang H et al: Genetic variations in Tibetan populations and high-altitude adaptation at the Himalayas. Mol Biol Evol 2011; 28: 1075–1081.
Article CAS Google Scholar
Simonson TS, Yang Y, Huff CD et al: Genetic evidence for high-altitude adaptation in Tibet. Science 2010; 329: 72–75.
Article CAS Google Scholar
Suo C, Xu H, Khor CC et al: Natural positive selection and north-south genetic diversity in East Asia. Eur J Hum Genet 2012; 20: 102–110.
Article CAS Google Scholar
Piel FB, Patil AP, Howes RE et al: Global distribution of the sickle cell gene and geographical confirmation of the malaria hypothesis. Nat Commun 2010; 1: 104.
Article Google Scholar
Karlsson EK, Harris JB, Tabrizi S et al: Natural selection in a Bangladeshi population from the cholera-endemic Ganges river delta. Sci Transl Med 2013; 5: 192ra186.
Article Google Scholar
Voight BF, Kudaravalli S, Wen X, Pritchard JK : A map of recent positive selection in the human genome. PLoS Biol 2006; 4: e72.
Article Google Scholar
Sabeti PC, Varilly P, Fry B et al: Genome-wide detection and characterization of positive selection in human populations. Nature 2007; 449: 913–918.
Article CAS Google Scholar
Liu X, Ong RT, Pillai EN et al: Detecting and characterizing genomic signatures of positive selection in global populations. Am J Hum Genet 2013; 92: 866–881.
Article CAS Google Scholar
Frazer KA, Ballinger DG, Cox DR et al: A second generation human haplotype map of over 3.1 million SNPs. Nature 2007; 449: 851–861.
Article CAS Google Scholar
Teo YY, Sim X, Ong RT et al: Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations. Genome Res 2009; 19: 2154–2162.
Article CAS Google Scholar
Xing J, Watkins WS, Shlien A et al: Toward a more uniform sampling of human genetic diversity: a survey of worldwide populations by high-density genotyping. Genomics 2010; 96: 199–210.
Article CAS Google Scholar
Xing J, Watkins WS, Hu Y et al: Genetic diversity in India and the inference of Eurasian population expansion. Genome Biol 2010; 11: R113.
Article CAS Google Scholar
Altshuler DM, Gibbs RA, Peltonen L et al: Integrating common and rare genetic variation in diverse human populations. Nature 2010; 467: 52–58.
Article CAS Google Scholar
Browning BL, Browning SR : A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 2009; 84: 210–223.
Article CAS Google Scholar
Xu S, Li S, Yang Y et al: A genome-wide search for signals of high-altitude adaptation in Tibetans. Mol Biol Evol 2011; 28: 1003–1011.
Article Google Scholar
Lou H, Li S, Yang Y et al: A map of copy number variations in Chinese populations. PLoS One 2011; 6: e27341.
Article CAS Google Scholar
Sato T, Nakagome S, Watanabe C et al: Genome-wide SNP analysis reveals population structure and demographic history of the Ryukyu islanders in the southern part of the Japanese archipelago. Mol Biol Evol 2014; 31: 2929–2940.
Article CAS Google Scholar
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D : Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006; 38: 904–909.
Article CAS Google Scholar
Alexander DH, Novembre J, Lange K : Fast model-based estimation of ancestry in unrelated individuals. Genome Res 2009; 19: 1655–1664.
Article CAS Google Scholar
Browning SR, Browning BL : Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 2007; 81: 1084–1097.
Article CAS Google Scholar
Wangkumhang P, Shaw PJ, Chaichoompu K et al: Insight into the peopling of Mainland Southeast Asia from Thai population genetic structure. PLoS One 2013; 8: e79522.
Article CAS Google Scholar
Yamaguchi-Kabata Y, Nakazono K, Takahashi A et al: Japanese population structure, based on SNP genotypes from 7003 individuals compared to other ethnic groups: effects on population-based association studies. Am J Hum Genet 2008; 83: 445–456.
Article CAS Google Scholar
Yoshiura K, Kinoshita A, Ishida T et al: A SNP in the ABCC11 gene is the determinant of human earwax type. Nat Genet 2006; 38: 324–330.
Article CAS Google Scholar
Rodriguez S, Steer CD, Farrow A, Golding J, Day IN : Dependence of deodorant usage on ABCC11 genotype: scope for personalized genetics in personal hygiene. J Invest Dermatol 2013; 133: 1760–1767.
Article CAS Google Scholar
Chen CC, Lu RB, Chen YC et al: Interaction between the functional polymorphisms of the alcohol-metabolism genes in protection against alcoholism. Am J Hum Genet 1999; 65: 795–807.
Article CAS Google Scholar
Ali M, Liu X, Pillai EN et al: Characterizing the genetic differences between two distinct migrant groups from Indo-European and Dravidian speaking populations in India. BMC Genet 2014; 15: 86.
Article Google Scholar
Liu X, Yunus Y, Lu D et al: Differential positive selection of malaria resistance genes in three indigenous populations of Peninsular Malaysia. Hum Genet 2015; 134: 375–392.
Article CAS Google Scholar
Reich D, Thangaraj K, Patterson N, Price AL, Singh L : Reconstructing Indian population history. Nature 2009; 461: 489–494.
Article CAS Google Scholar
Wong LP, Lai JK, Saw WY et al: Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing. PLoS Genet 2014; 10: e1004377.
Article Google Scholar
Hernandez RD, Kelley JL, Elyashiv E et al: Classic selective sweeps were rare in recent human evolution. Science 2011; 331: 920–924.
Article CAS Google Scholar
Kamberov YG, Wang S, Tan J et al: Modeling recent human evolution in mice by expression of a selected EDAR variant. Cell 2013; 152: 691–702.
Article CAS Google Scholar
Fujimoto A, Kimura R, Ohashi J et al: A scan for genetic determinants of human hair morphology: EDAR is associated with Asian hair thickness. Hum Mol Genet 2008; 17: 835–843.
Article CAS Google Scholar
Kimura R, Yamaguchi T, Takeda M et al: A common variation in EDAR is a genetic determinant of shovel-shaped incisors. Am J Hum Genet 2009; 85: 528–535.
Article CAS Google Scholar
Reich D, Green RE, Kircher M et al: Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 2010; 468: 1053–1060.
Article CAS Google Scholar
Krause J, Fu Q, Good JM et al: The complete mitochondrial DNA genome of an unknown hominin from southern Siberia. Nature 2010; 464: 894–897.
Article CAS Google Scholar
Lipson M, Loh PR, Patterson N et al: Reconstructing Austronesian population history in Island Southeast Asia. Nat Commun 2014; 5: 4689.
Article CAS Google Scholar
Reich D, Patterson N, Kircher M et al: Denisova admixture and the first modern human dispersals into Southeast Asia and Oceania. Am J Hum Genet 2011; 89: 516–528.
Article CAS Google Scholar
Shah AM, Tamang R, Moorjani P et al: Indian Siddis: African descendants with Indian admixture. Am J Hum Genet 2011; 89: 154–161.
Article CAS Google Scholar

Download references

Acknowledgements

We thank all the subjects in this study for their participation. YYT acknowledges the support of the Saw Swee Hock School of Public Health, the National University Health System and the Life Sciences Institute from the National University of Singapore. SX acknowledges financial support from the National Natural Science Foundation of China (91331204 and 31525014), the Key Research Program of Frontier Sciences of the Chinese Academy of Sciences (QYZDJ-SSW-SYS009), and the Program of Shanghai Academic Research Leader (16XD1404700). ST acknowledges support from the Thailand Research Fund mid-career search grant RSA5860081.

Author information

Authors and Affiliations

NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore, Singapore
Xuanyao Liu & Yik-Ying Teo
Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
Xuanyao Liu, Woei-Yuh Saw, RickTwee-Hee Ong & Yik-Ying Teo
Max Planck Independent Research Group on Population Genomics, Chinese Academy of Sciences and Max Planck Society Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
Dongsheng Lu & Shuhua Xu
Life Sciences Institute, National University of Singapore, Singapore, Singapore
Woei-Yuh Saw & Yik-Ying Teo
National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, Pathum Thani, Thailand
Philip J Shaw, Pongsakorn Wangkumhang, Chumpol Ngamphiw & Sissades Tongsima
Institute of Molecular Biosciences, Mahidol University, Nakhon Pathom, Thailand
Suthat Fucharoen
Faculty of Science, Molecular Medicine Graduate Programme, Mahidol University, Bangkok, Thailand
Worachart Lert-itthiporn
Division of Bioinformatics and Data Management for Research, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
Worachart Lert-itthiporn, Kwanrutai Chin-inmanu & Prapat Suriyaphol
Oxford University Clinical Research Unit, Hospital for Tropical Diseases, Ho Chi Minh City, Vietnam
Tran Nguyen Bich Chau, Katie Anders, Sarah J Dunstan & Cameron Simmons
Nuffield Department of Clinical Medicine, Centre for Tropical Medicine, University of Oxford, Oxford, UK
Katie Anders, Sarah J Dunstan & Cameron Simmons
Department of Public Health, Faculty of Medicine, University of Kelaniya, Ragama, Sri Lanka
Anuradhani Kasturiratne & Ananda R Wickremasinghe
Department of Medicine, Faculty of Medicine, University of Kelaniya, Ragama, Sri Lanka
H Janaka de Silva
Department of Clinical Gene Therapy, Osaka University Graduate School of Medicine, Suita, Japan
Tomohiro Katsuya
Department of Human Biology and Anatomy, Graduate School of Medicine, University of the Ryukyus, Nishihara-cho, Japan
Ryosuke Kimura
Department of Functional Pathology, Shimane University School of Medicine, Izumo, Japan
Toru Nabika
Department of Hygiene and Public Health, Teikyo University School of Medicine, Tokyo, Japan
Takayoshi Ohkubo
Center for Genomic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
Yasuharu Tabara
Department of Gene Diagnostics and Therapeutics, National Center for Global Health and Medicine, Tokyo, Japan
Fumihiko Takeuchi, Norihiro Kato & Yik-Ying Teo
Department of Medical Chemistry, Kurume University School of Medicine, Kurume, Japan
Ken Yamamoto
Department of Genome Science, School of Dentistry, Aichi Gakuin University, Nagoya, Japan
Mitsuhiro Yokota
College of the Life Sciences and Technology, Xinjiang University, Urumqi, China
Dolikun Mamatyusupu
Key Laboratory of Reproduction and Heredity of Ningxia Region, Ningxia Medical University, Yinchuan, China
Wenjun Yang
Department of Microbiology, Integrated Research Center for Genome Polymorphism, The Catholic University Medical College, Seoul, Korea
Yeun-Jun Chung
State Key Laboratory of Genetic Engineering and Ministry of Education (MOE), Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, China
Li Jin
Faculty of Medicine and Health Sciences, UCSI University, Kuala Lumpur, Malaysia
Boon-Peng Hoh
Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
Chiea-Chuen Khor & Yik-Ying Teo
The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, Victoria, Australia
Sarah J Dunstan
Department of Microbiology and Immunology, The University of Melbourne, Melbourne, Victoria, Australia
Cameron Simmons
Institute of Personalized Genomics and Gene Therapy (IPGG), Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
Prapat Suriyaphol
School of Life Sciences and Technology, ShanghaiTech University, Shanghai, China
Shuhua Xu
Collaborative Innovation Center of Genetics and Development, Shanghai, China
Shuhua Xu
Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore
Yik-Ying Teo

Authors

Xuanyao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Dongsheng Lu
View author publications
You can also search for this author in PubMed Google Scholar
Woei-Yuh Saw
View author publications
You can also search for this author in PubMed Google Scholar
Philip J Shaw
View author publications
You can also search for this author in PubMed Google Scholar
Pongsakorn Wangkumhang
View author publications
You can also search for this author in PubMed Google Scholar
Chumpol Ngamphiw
View author publications
You can also search for this author in PubMed Google Scholar
Suthat Fucharoen
View author publications
You can also search for this author in PubMed Google Scholar
Worachart Lert-itthiporn
View author publications
You can also search for this author in PubMed Google Scholar
Kwanrutai Chin-inmanu
View author publications
You can also search for this author in PubMed Google Scholar
Tran Nguyen Bich Chau
View author publications
You can also search for this author in PubMed Google Scholar
Katie Anders
View author publications
You can also search for this author in PubMed Google Scholar
Anuradhani Kasturiratne
View author publications
You can also search for this author in PubMed Google Scholar
H Janaka de Silva
View author publications
You can also search for this author in PubMed Google Scholar
Tomohiro Katsuya
View author publications
You can also search for this author in PubMed Google Scholar
Ryosuke Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Toru Nabika
View author publications
You can also search for this author in PubMed Google Scholar
Takayoshi Ohkubo
View author publications
You can also search for this author in PubMed Google Scholar
Yasuharu Tabara
View author publications
You can also search for this author in PubMed Google Scholar
Fumihiko Takeuchi
View author publications
You can also search for this author in PubMed Google Scholar
Ken Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar
Mitsuhiro Yokota
View author publications
You can also search for this author in PubMed Google Scholar
Dolikun Mamatyusupu
View author publications
You can also search for this author in PubMed Google Scholar
Wenjun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yeun-Jun Chung
View author publications
You can also search for this author in PubMed Google Scholar
Li Jin
View author publications
You can also search for this author in PubMed Google Scholar
Boon-Peng Hoh
View author publications
You can also search for this author in PubMed Google Scholar
Ananda R Wickremasinghe
View author publications
You can also search for this author in PubMed Google Scholar
RickTwee-Hee Ong
View author publications
You can also search for this author in PubMed Google Scholar
Chiea-Chuen Khor
View author publications
You can also search for this author in PubMed Google Scholar
Sarah J Dunstan
View author publications
You can also search for this author in PubMed Google Scholar
Cameron Simmons
View author publications
You can also search for this author in PubMed Google Scholar
Sissades Tongsima
View author publications
You can also search for this author in PubMed Google Scholar
Prapat Suriyaphol
View author publications
You can also search for this author in PubMed Google Scholar
Norihiro Kato
View author publications
You can also search for this author in PubMed Google Scholar
Shuhua Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yik-Ying Teo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yik-Ying Teo.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Supplementary information

Supplementary Information (DOCX 707 kb)

Supplementary Table S3 (XLS 120 kb)

Supplementary Table S4 (XLS 57 kb)

Supplementary Table S5 (XLS 31 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, X., Lu, D., Saw, WY. et al. Characterising private and shared signatures of positive selection in 37 Asian populations. Eur J Hum Genet 25, 499–508 (2017). https://doi.org/10.1038/ejhg.2016.181

Download citation

Received: 12 February 2016
Revised: 22 October 2016
Accepted: 01 November 2016
Published: 18 January 2017
Issue Date: April 2017
DOI: https://doi.org/10.1038/ejhg.2016.181

This article is cited by

Reconstructing the ancestral gene pool to uncover the origins and genetic links of Hmong–Mien speakers
- Yang Gao
- Xiaoxi Zhang
- Shuhua Xu
BMC Biology (2024)
PGG.SNV: understanding the evolutionary and medical implications of human single nucleotide variations in diverse populations
- Chao Zhang
- Yang Gao
- Shuhua Xu
Genome Biology (2019)

Subjects

Abstract

Similar content being viewed by others

Introduction

Methods

Sample collection and genotyping in ADP

Data availability

Population structure analyses

Haplotype phasing

Locating genomic signatures of positive selection

Haplotype similarity index

Results

Population structure analyses

Detecting genomic signatures of positive selection

Quantifying the distribution of selection signals in Asia

Sharing of selection signals: single-origin selection and convergent evolution across Asian populations

Evidence of localised adaptation and ancestry-specific selection

Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links