Introduction

The Philippines figures prominently in two significant demographic events in the Asia-Pacific region. The first was the initial colonization of Sunda and Sahul around 40–50 thousand years ago (KYA)1 to which the human fossil remains from Tabon Cave, Palawan in the Philippines, with estimated ages ranging from 14 to 58 thousand years (KY),2 may be associated.3 The second was the Austronesian expansion, which is proposed to have originated in Taiwan around 5–6 KYA and spread first to the Philippines before continuing to Southeast Asia, Near and Remote Oceania, and Madagascar.4 The impact of the Austronesian expansion is evidenced by the fact that all Philippine languages today are Austronesian.5 In addition, a popularly accepted ‘waves of migration theory’ suggests that the origins and diversity of people in the Philippines was the result of several migrations into the archipelago during the periods of low water levels, when there existed land bridges with the Sunda shelf via Palawan and the Sulu Archipelago, and with Celebes via Mindanao.6, 7

Currently, over 170 Filipino ethnolinguistic (FE) groups are recognized on the basis of language and culture.5 Of these FE groups, anthropologists have traditionally been most interested in those with short stature, frizzy hair, and dark skin color, historically defined as ‘Negrito’ groups.8 There are more than 20 FE Negrito (FEN) groups in the Philippines.5 It has been proposed that Negrito groups represent a separate migration out of Africa.9 Although all FEN groups currently speak Austronesian languages, it has been proposed that they originally spoke non-Austronesian languages, which were subsequently replaced by Austronesian languages.10 This view, along with their physical features and hunter-gatherer mode of subsistence, has resulted in the historical depiction of the FEN groups as the aboriginal, first occupants of the archipelago, and as such, the popular distinction of FEN vs non-Negrito FE (FEnN) groups has been perpetuated.7

To date, no comprehensive genetic diversity study of FE groups has been presented. A mitochondrial DNA (mtDNA) study on Philippine populations showed further support for the ‘out of Taiwan’ hypothesis for the Austronesian expansion, as well as identifying rare mtDNA haplotypes; however, samples were not from specific FE groups, but from urban populations, and from Filipinos living in Taiwan.11 Early studies have inferred genetic differences between some FEN groups (the Aeta of Bataan, Aeta of Zambales and the Mamanwa)12, 13 and possible affinities with populations, such as indigenous Australians and Papuan groups.12 A recent genome-wide study on Asian populations included six FEN groups, but only one other FEnN group, and three urban populations.14

We, therefore, report a survey of genetic variation in the non-recombining region of the human Y chromosome (NRY) of 16 FE groups (six FEN and 10 FEnN groups). Our results indicate that, a simple division of FEN vs FEnN groups is not supported by the NRY evidence. Instead, FE groups, in general, demonstrate extensive heterogeneity and diverse genetic affinities with their geographic neighbors, suggesting a complex history, reflecting varying roles of migration from outside the Philippines, genetic drift, and admixture among FE groups and with other Asia-Pacific groups.

Materials and methods

Population samples

DNA samples were collected from 390 unrelated males from 16 FE groups, representing eight Philippine geographical regions and three language groupings (Figure 1, Supplementary Text). Population sampling was conducted independently by the DNA Analysis Laboratory, Natural Sciences Research Institute, University of the Philippines (UP-NSRI-DAL) and the Max Planck Institute for Evolutionary Anthropology (MPI-EVA), and the sample collections of both institutions were combined to form the data set in this study (Supplementary Text). Human population sampling procedures followed the guidelines outlined by the Philippine National Bioethics Committee on Scientific Research, the National Commission on Indigenous Peoples (NCIP) and the MPI-EVA. Free, prior and informed consent for sample collection was obtained at the level of the community and the individual.

Figure 1
figure 1

Distribution of Y-SNP haplogroups observed among Filipino groups and the reference data set. The figure shows the frequency and geographical distribution across the Asia-Pacific region of the 11 Y-SNP haplogroups observed among Filipino groups. Pie chart size represents population sample size. Colors distinguish haplogroups within pie charts, with pie chart sections representing haplogroup frequencies. Population groups (East Asian, Southeast Asia, Melanesian, Fiji, Polynesia, Australia, and the Philippines) each have a common color for their three-letter population codes. For the Philippines, labels include language group name, [population code], population sample size ‘n’. Labels in parenthesis indicate geographical region (R#, CAR – Cordillera Administrative Region, ARMM – Autonomous Region of Muslim Mindanao) and language classification: northern Philippine language (NPL), central Philippine language (CPL) or southern Philippine language (SPL) based on the Ethnologue.5 Underlined Filipino group names and codes indicate Negrito groups. Population codes for the reference data set are enumerated in Supplementary Text.

DNA typing

DNA samples were extracted from whole-blood or oral samples (buccal swabs and/or saliva) using commercially available (FTA Gene Guard system (Whatman Inc., Springfield Mill, Maidstone, Kent, UK), QIAamp DNA Blood Mini kit (QIAGEN Inc., Valencia, CA, USA)) and published protocols.15 DNA samples were typed for 48 NRY binary markers (Y-SNPs) (Supplementary Figure 1). A set of Y-SNP multiplexes previously described16, 17 were typed using the ABI Prism SNaPshot multiplex kit (Applied Biosystems, Foster, CA, USA) (Supplementary Text, Supplementary Table 1). Another set of multiplexes were typed using a single-base extension (SBE) assay described elsewhere18 (Supplementary Text, Supplementary Table 1). Y-chromosome microsatellite/short tandem repeat (Y-STR) markers were typed using the PowerPlex Y system (Promega Corporation, Madison, WI, USA) following manufacturer's instructions (Supplementary Text). For Y-SNP haplogroup nomenclature, the capital letter (major clade) – terminal mutation (NRY binary marker) naming system19 was adopted.

Data analyses

For population comparisons, a reference data set was assembled from previously published works,20, 21, 22, 23, 24 composed of 1756 males from 60 groups representing five Asia-Pacific regions (Figure 1). Reference data set population codes and measures to ensure compatibility with the Filipino data set are discussed in Supplementary Text.

The Arlequin software version 3.1125 (http://cmpg.unibe.ch/software/arlequin3) was used to calculate genetic diversity indices: Haplogroup diversity (HgD), Haplotype diversity (HtD), haplotype mean pairwise difference (MPD) and Analysis Of Molecular Variance (AMOVA). AMOVA analyses were performed using different groupings (geography, language classification, FEN vs FenN, and all groups collectively (Figure 1)). Mann–Whitney U tests and correspondence analyses (CA) were performed using the STATISTICA software package.26 Network analyses were performed using Network version 4.510 and Network Publisher version 1.1.0.6 (http://fluxus-engineering.com). The BATWING program27 (http://www.mas.ncl.ac.uk/~nijw/) was used to estimate haplogroup coalescent times (Time since the Most Recent Common Ancestor –(TMRCA)). The IM program28 (http://genfaculty.rutgers.edu/hey/software) was used to estimate divergence times and migration rates. Details for Network analyses, estimation of coalescent times, divergence times, and migration rates are given in Supplementary Text.

Results

NRY diversity and genetic relationships of Filipino ethnolinguistic groups

Eleven Y-SNP haplogroups were observed among 390 unrelated males from 16 FE groups (Figure 1). Five major haplogroups (C-RPS4Y, K-M9, O-M119, O-M110, and O-M122) had frequencies >5% and were present among a greater number of FE groups than minor haplogroups (NO-M214, O-M175, O-M7, O-M95, O-M134, and P-M74) with <5% frequency (Table 1, Figure 1). In general, there is a patchy distribution of all haplogroups (Figure 1), such that even the most frequent haplogroup O-M119, observed among northern, central, and southern FE groups, was present in only 12 out of 16 FE groups. This distribution may be the result of genetic drift, which is detectable with the NRY because of its low effective population size.

Table 1 Frequency of Y-SNP haplogroups observed among Filipino groups

For genetic diversity indices (Table 2), HgD ranged from 0 (in the Aeta of Zambales and the Tadyawan, who were fixed for haplogroups K-M9 and O-M110, respectively) to 0.83. Y-STR diversity indices such as HtD also ranged widely, from 0.59 (in the Tadyawan) to 0.97 (Ivatan, CAR, and Surigaonon), and MPD ranged from 1.40 (again in the Tadyawan) to 4.63 (Agta). Extremely low HgD values were observed for some FEN (Aeta of Zambales and Bataan) and FEnN groups, (Bugkalot, Tadyawan, and Tawbuid) further suggesting genetic drift owing to small population sizes. The genetic diversity indices of FEN groups did not significantly differ from FEnN groups, based on Mann–Whitney U tests (HgD: Z=0.52, P=0.61; HtD: Z=−0.32, P=0.75; MPD: Z=1.10, P=0.27), implying no distinction between FEN and FEnN groups based on NRY diversity.

Table 2 Genetic diversity indices estimated for Filipino groups

AMOVA analyses showed extensive heterogeneity among FE groups, among FEN groups, and among FEnN groups, with only about 70% of the variation observed within populations in each comparison (Table 3). Only a geographic grouping resulted in a slightly larger difference among groups than among populations within groups. Furthermore, the grouping of FEN vs FEnN also provided a poor description of genetic structure, consistent with the absence of a significant difference between FEN and FEnN diversity indices.

Table 3 Analysis of molecular variance (AMOVA) results

CA based on Y-SNP haplogroup frequencies were used to visualize the genetic relationships of 14 Filipino groups (with samples sizes >10) in association with the distribution of Y-SNP haplogroups. The CA plot (Figure 2) showed three clusters of FE groups: one consisting of the Aeta of Zambales, Aeta of Bataan, and Hanunuo, associated with haplogroup K-M9; one consisting of the Ati, Kalangoya, Tawbuid, and Tadyawan, associated with haplogroups O-M7, O-M110, and O-M134; and the third consisting of the remaining FE groups and haplogroups. There was no tendency for FEN groups to cluster together as they were distributed in all three clusters. There was also no tendency for the three clusters of FE groups to reflect geographic relationships, with the exception of the three southern groups (Surigaonon, Manobo, and Mamanwa), all from northern Mindanao, that cluster together.

Figure 2
figure 2

CA plot based on Y-SNP haplogroup frequencies of Filipino groups. Percentages in parentheses indicate the fraction of the variance explained by each dimension. Haplogroups are in gray text, while population codes are in black text. Population codes are the same as in Figure 1. All groups with sample sizes <10 (Figure 1) were excluded from the analyses.

Network (Figure 3) and haplotype sharing analyses (Supplementary Figure 2) showed that haplotypes in major haplogroups O-M119, O-M110, and O-M122 were shared among 14 FE groups, regardless of geographic proximity. In contrast, the shared haplotypes in haplogroups C-RPS4Y and KM9 were distributed among nine FE groups, and for the most part tended to involve groups that are geographically close. These patterns seem to reflect different histories for these haplogroups and suggest that C-RPS4Y and K-M9 are likely to be older, while O-M119, O-M110, and O-M122 would have spread more recently. TMRCA estimates and their credible intervals (CI) were consistent with C-RPS4Y (13 245; CI: 7824–20 644) and K-M9 (13 608; CI: 9225–19 709) being older than O-M119 (7299; CI: 5046–10 204), O-M110 (4167; CI: 2837–5898), and O-M122 (6223; CI: 4151-8886).

Figure 3
figure 3

Networks for Y-SNP haplogroups C-RPS4Y, K-M9, O-M119, O-M110, and O-M122. Networks are based on seven haplogroup-associated Y-STR loci for Filipino groups and the reference data set. Circles represent Y-STR haplotypes, with the area of the circle proportional to the frequency of the haplotype. A circle with more than one color indicates a haplotype that is shared between different populations. Nodes without circles that connect branches are median vectors that represent inferred ancestral haplotypes. Branch lengths are proportional to the number of mutational steps. Marked K-M9 network branches (b1–b7) highlight the different associations of FE groups in the K-M9 network. For all networks, specific FE groups are stated in the main paper text.

Genetic affinities of Filipino ethnolinguistic groups

The FE group data set was compared with a reference data set composed of 1756 males from 60 groups across the Asia-Pacific region.20, 21, 22, 23, 24 Figure 1 shows the distribution of shared Y-SNP haplogroups among FE groups and the reference data set. AMOVA results (Table 3) show that the heterogeneity of FE groups were comparable to or exceeded the heterogeneity of the other regions in the reference data set. A CA plot (Figure 4) shows that FE groups have different affinities. The major O haplogroups appear to influence the association of the majority of FE groups with East Asian and Southeast Asian groups and, to a lesser extent, with Melanesian and Polynesian groups. Haplogroups C-RPS4Y and K-M9, drive the association of several FEN groups (Aetas, Ati, and Agta) and the Hanunuo with Southeast Asian and Australian groups. For the major O haplogroups, O-M119, O-M110, and O-M122, Y-STR diversity indices (HtD and MPD) (Supplementary Table 3) were similar for the FE, East, and Southeast Asian groups. Network (Figure 3) and haplotype sharing (Supplementary Table 4) analyses for these three haplogroups showed haplotype sharing between FE groups and all Asia-Pacific groups, with the exception of haplogroup O-M119, for which FE groups shared haplotypes only with East Asia and Southeast Asia. For haplogroup C-RPS4Y, network analysis (Figure 3) showed that FEnN (Ivatan, Manobo, and Surigaonon) and FEN (Ati and Mamanwa) haplotypes were associated with Southeast Asian haplotypes on one branch; while on another, FEN haplotypes (one Ati and all Agta C-RPS4Y haplotypes) were associated with indigenous Australians from Arnhem Land, one Fijian, and several Southeast Asians. This implies two different origins of haplogroup C-RPS4Y in FE groups, one associated with Southeast Asians, and another providing a possible signal of a genetic link between FEN groups (in particular, the Agta, who have the highest frequency of C-RPS4Y among FE groups) and indigenous Australians. It should be noted that the indigenous Australian groups in the reference data set possess, at high frequency, a unique DYS390.1 deletion on the background of haplogroup C-RPS4Y,20 as such these samples were excluded from the C-RPS4Y analysis. The DYS390.1 deletion has been associated with DYS390 allele lengths ranging from 18 to 22 repeats and so far is specific to Australians.20, 24, 29 Several FE groups (Ivatan, Agta, Ati, Surigaonon, Mamanwa, and Manobo) have C-RPS4Y/DYS390 allele lengths within this range (Supplementary Table 2). Although sequence analysis would be required to definitely rule out the DYS390.1 deletion in these individuals, network analysis does not show any association between these FE group haplotypes and the Australian DYS390.1 haplotypes (Supplementary Figure 3). With only three C-RPS4Y haplotypes shared (two among FE groups and one between FE and Southeast Asians) (Figure 3 and Supplementary Table 4), these NRY links between FE, Southeast Asian, and Australian groups are likely to be old. Haplogroup K-M9 was shared between FE groups and different Asia-Pacific regions; but within K-M9, different affinities of FE groups were observed, as seen on different branches of the K-M9 network (Figure 3, b1-b7). FEN (Iraya and Mamanwa) and FEnN (Maranao and Manobo) haplotypes clustered with Southeast Asian and Melanesian haplotypes on b1 and b2. FEnN (Surigaonon and Manobo) haplotypes clustered with Southeast Asian, Melanesian, and Fijian haplotypes on b3. All Ati (FEN) haplotypes were clustered with one Hanunuo (FEnN) haplotype on b4. FEN (two Aeta groups and the Agta) haplotypes were associated with Hanunuo (FEnN) and Southeast Asian haplotypes on b5. FEN (two Aeta groups) haplotypes were associated with one Polynesian, one Fijian, one Melanesian, and six Australian haplotypes on b6. Only FEN (two Aeta groups and the Agta) haplotypes were clustered on b7. Overall, results of FE group-reference data set comparisons for the major O haplogroups, C-RPS4Y, and K-M9 showed the diverse genetic affinities of FE groups. These comparisons also showed support for the observed genetic differences (based on genetic diversity, AMOVA, and CA) among all FE groups.

Figure 4
figure 4

CA plot based on haplogroup frequencies of Filipino groups and the reference data set. Percentages under each dimension label indicate the fraction of the variance explained by each dimension. All groups with sample sizes <10 (Figure 1, Supplementary Text) were excluded from the analyses. Haplogroups are in gray text, while population codes are in black text. Population codes are the same as in Figure 1 and enumerated in Supplementary Text.

Negrito – indigenous Australian association: ancient divergence or recent contact?

As there seem to be signals of genetic links between several FEN groups (Aeta-Bataan, Aeta-Zambales, and Agta) and indigenous Australians (Arnhem Land and Great Sandy Desert) (Figure 3: C-RPS4Y and K-M9; Figure 4), divergence times and migration rates between these groups were estimated through pairwise, simulation-based analyses using the IM program.28 Divergence time estimates ranged from 14 549 to 19 799 Y (CI: 4939–75 059), although gene flow was detected from the Arnhem Land Australians to the Aeta of Bataan and the Agta, and from the Great Sandy Desert Australians to the Agta (Table 4 and Supplementary Figure 4). Point estimates indicate divergence times of about 15–20 KYA, after the initial colonization of the Asia-Pacific region around 40–50 KYA,1 though CI values overlap with the initial colonization time period. Time estimates, therefore, do not distinguish between ancient vs more recent divergence between the ancestors of FEN and indigenous Australian groups.

Table 4 Scaled estimates of divergence times (T) and migration rates (M1and M2)

Discussion

Our analyses revealed several attributes of FE group NRY lineages, namely: extensive heterogeneity, signatures of old and recent periods, and diverse affinities. Heterogeneity is seen in FEN groups, who show an old substrate (haplogroups C-RPS4Y and K-M9) suggesting ancient shared ancestry. Similarities in physical features and mode of subsistence of FEN groups have been the main, if not the only (anthropological), support for the inference of common ancestry.30 Haplogroups K-M9 and C-RPS4Y in this study provide genetic support for the common ancestry of FEN groups. FEN groups also share haplogroups with FEnN groups (ie, Mamanwa with Manobos and Surigaonons; Aeta with Hanunuo), but rather than ancient shared ancestry, gene flow between FEN and FEnN groups has been inferred by early anthropological studies.30 Heterogeneity among FEnN groups is best exemplified by the Hanunuo, Tadyawan, Tawbuid, and Iraya. Historically grouped and called ‘Mangyan’ (with the Iraya also considered a FEN group because of physical features), these groups showed distinct NRY lineages despite their geographical proximity on the island of Mindoro; this may reflect differing cultural practices and language barriers between these groups.30 Owing to readily distinguishable physical features that may be the result of adaptation to their environment,31 confounded by historical and social circumstances, a general grouping of FE groups into FEN and FEnN has been perpetuated.7 However, we find that from a NRY perspective, the extensive heterogeneity of FE groups, influenced by drift and admixture, does not support a simple distinction between FEN and FEnN groups.

Summarized recently, haplogroups C-RPS4Y and K-M9, represent founding lineages in the Asia-Pacific region, while O-M119, O-M110, and O-M122 are associated with the Austronesian expansion.32 The phylogeographic analyses of C-RPS4Y and K-M9 in FE groups suggest that these are old NRY lineages. TMRCA estimates for C-RPS4Y (13 245; CI: 7824–20 644) and K-M9 (13 608; CI: 9225–19 709) also reflect a late Paleolithic timeline and overlap with the ages (14–58 KY) of upper Pleistocene human remains from Tabon Cave, Palawan, Philippines.2 TMRCA estimates for O-M119 (7299; CI: 5046–10 204), O-M110 (4167; CI: 2837–5898), and O-M122 (6223; CI: 4151–8886) in FE groups do overlap with the Austronesian expansion time (5–6 KY).4 In association with the Austronesian expansion, these haplogroups are presumed to have reached the Philippines via Taiwan. However, restricted to the reference data set in this study, only O-M110 has a clear Taiwan-specific origin.23 An alternative scenario was suggested for O-M122, involving expansion from East Asia to Southeast Asia and on to Melanesia.33 O-M122 was found only among central and southern FE groups with significant frequency differences among northern, central, and southern groups (χ2 test; P=0.000029). O-M122 is one of several NRY haplogroups whose diversity has been affected by drift,34 which could explain the absence of O-M122 in northern FE groups. However, a southern entry point into the Philippines could also explain our results. The FE group data for the major O-haplogroups simply suggests that they expanded into the Philippines, either from Taiwan or elsewhere in East Asia, in association with the Austronesian expansion. Overall, this study presents genetic evidence for the settlement of NRY lineages in the Philippines within a time frame after the initial colonization (after 40–50 KYA) of the Asia-Pacific region, extending into the time frame of the Austronesian expansion.

Recently, analysis of about 50 000 SNPs in a wide variety of Asian populations, including several FEN groups, found no clear-cut genetic distinction between FEN and FEnN groups, and concluded that, FEN and FEnN groups were part of the same wave of migration.14 This conclusion seems at odds with both the NRY data, which indicate an earlier presence of haplogroups C-RPS4Y and K-M9 in the Philippines that is associated with FEN groups, and with mtDNA data indicating novel and ancient mtDNA lineages in a FEN group (E Gunnarsdottir and M Stoneking, unpublished data). A possible reconciliation of the genome-wide SNP data with the uniparental marker data is, if the ancestors of FEN groups were isolated from the ancestors of FEnN groups (either as a separate migration or as part of the same migration), but then in recent times FEN groups have experienced substantial, primarily male-mediated admixture with FEnN groups.35 Such substantial admixture, documented in the NRY data as discussed above, could account for the genome-wide SNP data.14

Another interesting finding was the link between C-RPS4Y and K-M9 haplotypes of some FEN groups and indigenous Australians. An early study on FEN groups, using immunoglobulin G genetic markers, inferred a possible ancient link with northeastern indigenous Australians.12 With information only from the NRY, our demographic inference yielded broad divergence time estimates, but with inferred gene flow that seem to run counter to the commonly-held view that there was little contact between Australia and the rest of the world after initial colonization.36 However, given the broad divergence time estimates, as well as the ancient associations inferred from network and haplotype sharing analyses, the inferred gene flow between FEN groups and indigenous Australians need not necessarily imply direct contact between these groups. An alternative possibility is that, gene flow from Australia to the Philippines was only via neighboring populations, in a ‘stepping-stone’ manner. Data from additional loci would be needed to confirm this apparent signal of more recent gene flow involving FEN and indigenous Australian groups.

Our study provides a number of important insights into the genetic history of Filipinos. First, there is extensive diversity in the NRY lineages of FE groups. In general, a simple dichotomy of Negrito vs non-Negrito is not supported by Filipino NRY data. Second, Filipino NRY lineages reflect a chronology and thus, a history that extends from after the initial colonization of the Asia-Pacific region, into the time frame of the Austronesian expansion. Third, the diverse NRY genetic affinities of Filipinos can also be associated with the Austronesian expansion, with older affinities reflecting demographic scenarios after the initial colonization of the Asia-Pacific region, including the possible divergence and subsequent gene flow between some Negrito groups and indigenous Australians. Although the Y-chromosome is a powerful tool for studying demographic events and population history, and has yielded some important new insights in the present study, it is still a single genetic locus. To obtain more accurate estimates of demographic parameters, to verify genetic associations, and to further characterize Filipino genetic diversity, genetic studies from other loci are needed.