Introduction

The Austronesian diaspora is believed to have been initiated by the migration of the Lapita peoples from Taiwan around 5,500 years ago, who settled throughout Southeast Asia, the Pacific and Madagascar in the Indian Ocean just off the coast of East Africa. According to Ruhlen (1994) this oceanic transversal postdates the migration of Neolithic farmers from southern China (8,000 bp), who ventured across the Strait of Formosa and into Taiwan. Two main theories have been proposed to explain the Austronesian dispersal to Southeast Asia and the Pacific: the “entangled-bank” and “express-train” hypotheses. The first states that Polynesian inhabitants derive from Melanesian stock rather than originating recently in Asia (Terrell et al. 1997; Hagelberg 1999; Kayser et al. 2000, 2006; Oppenheimer and Richards 2001a, b; Hurles et al. 2002), while the latter espouses Formosan origins and a rapid dispersal through Micronesia into Polynesia (Melton et al. 1995; Bellwood 1997; Lum 1998; Green 1999; Hagelberg et al. 1999; Diamond 2000; Gray and Jordan 2000; Trejaut et al. 2005). The “express-train to Polynesia” model further stipulates that proto-Austronesians arrived in Taiwan around 5,500 bp and had reached the Philippines by 5,300 bp (Gray and Jordan 2000). From the Philippines two diverging routes seem probable, a western trajectory resulting in the colonization of Malaysia, the Indonesian archipelago and Madagascar, and an eastern course leading to the settlement of Borneo, Sulawesi, New Guinea and, finally, Western Polynesia around 3,200 bp (Gray and Jordan 2000).

The island of Madagascar, located at the western fringes of this dissemination, is separated from continental East Africa by the Mozambique Channel, spanning a mere 300 miles (Singer et al. 1957). The Malgache language is a member of the Malayo-Polynesian offshoot of the Austronesian family, nevertheless certain words are Bantu in origin (Dahl 1951, 1988; Singer et al. 1957; Adelaar 1995). Phenotypically, the Malagasy showcase a widespread array of physical features ranging from Asiatic to sub-Saharan African and mosaics of the two (David 1940; Singer et al. 1957). Although several studies have established that both African and Southeast Asian populations have contributed to Madagascar’s gene pool, the relative proportions and specific source populations remain unclear.

Early research based on the ABO blood group led to the hypothesis that a Malagasy tribal population, the Hova people, arose from the admixture of Mongoloid migrants from Malaya with Madagascar’s native inhabitants (David 1940). A similar study based on the Rh factor determined that about 65% of Madagascar’s gene pool is of Bantu descent while the remaining 35% can be traced to Indonesia (Singer et al. 1957). More recent studies have established clearer connections to the former two ancestral populations (Migot et al. 1995; Hewitt et al. 1996) and mtDNA analyses have found traces of the “Polynesian motif” on the island (Soodyall et al. 1995). Moreover, a study including both Y-chromosome and mtDNA lineages tracks the Southeast Asian influence to Borneo and reports that only 38% of the Malagasy mtDNA and 55% of Y-chromosomal lineages are of African descent (Hurles et al. 2005).

At the other extreme of the expansion lies Polynesia, a region encompassing several island chains including the Samoan and Tongan archipelagos. Samoa and Tonga are closely linked not only geographically but historically as well. During the Austronesian spread across the Pacific, it is believed that migrants first settled in Samoa and, after a migrational hiatus lasting approximately one thousand years, expanded into Tonga and the rest of Polynesia (Soljak 1946). While more phylogenetic studies have been conducted on these Pacific islands than on Madagascar, a dichotomy exists between data generated from Y-chromosomal and mtDNA studies. Analyses utilizing Y-chromosome data delineate close ties between Melanesia and Polynesia and only indirect connections to Asia (some of the Y-chromosomes present in Melanesia do originate in Asia) (Hagelberg et al. 1999; Kayser et al. 2000, 2006; Hurles et al. 2002). Kayser et al. (2000) have thus proposed the “slow-boat” theory postulating that Austronesians originated in Asia and traversed slowly through Melanesia allowing for extensive genetic interactions between the migrants and Melanesian natives.

On the other hand, mtDNA studies have established clear links between Northeast and Southeast Asia and the Pacific populations (Melton et al. 1995; Lum 1998; Hagelberg et al. 1999; Trejaut et al. 2005; Kayser et al. 2006). Trejaut et al. (2005) elaborate that the mtDNA phylogeny of populations within this region parallels the linguistic topology suggesting that the Austronesian expansion has a Formosan origin. In turn, Kayser et al. (2006) showed that the Polynesian people displayed a greater proportion of paternally derived Melanesian lineages while maternal inheritance patterns reveal close genetic ties with Asian groups. Their findings indicate that 65.8% of Polynesian Y-chromosomes and 6% of mtDNAs are of Melanesian descent, while 28.5% of Y-chromosomes and 93.8% of mtDNAs are of Asian ancestry (Kayser et al. 2006).

The current project was undertaken to assess the contribution of East Asian source populations to Austronesian groups as geographically distant as Tonga and Madagascar (approximately 8,000 nautical miles). An additional goal of this study is to identify the sub-Saharan African groups that have had an impact on the gene pool of the Madagascar populace. For the aforementioned purposes, the two Austronesian populations from Tonga and Madagascar were compared to geographically targeted Austronesian and African collections across a set of 15 autosomal short tandem repeat (STR) loci.

Autosomal STRs are hypervariable markers that, because of their large number of alleles, high heterozygosity, abundance, and widespread distribution throughout the genome are especially useful in elucidating recent human evolutionary history (Jorde et al. 1997; Rowold and Herrera 2003; Perez-Miranda et al. 2005; Shepard et al. 2005; Shepard and Herrera 2006; Ibarra-Rivera et al. 2007). In addition, they may provide the high resolution needed in order to assess phylogenetic relationships among closely related populations (Rowold and Herrera 2003).

With the battery of autosomal STR markers employed in this study, we aim to provide a more representative genome-wide genetic profile of populations instead of relying on phylogenies derived entirely on uniparentally derived haplotypes. Our results indicate that while Madagascar derives most of its gene pool from the African continent, a genetic connection to Southeast Asia can also be discerned. Furthermore, the Malayo-Filipino group is outlined as the major Austronesian contributor to Madagascar, Tonga and Samoa, although influences from Formosa can also be appreciated in the three populations.

Materials and methods

Populations, sample collection and DNA isolation

Two populations of Austronesian descent Madagascar (n = 67) and Tonga (n = 51), were characterized. Peripheral blood samples were collected from unrelated individuals in EDTA Vacutainer tubes. Genealogical information was recorded for a minimum of two generations to establish regional ancestry. DNA was extracted by the standard phenol–chloroform method (Novick et al. 1995; Antuñez de Mayolo et al. 2002). Subsequent to ethanol precipitation, the purified DNA samples were stored as stock solutions in 10 mmol/l Tris–EDTA at −80°C. All collections were performed while adhering to the ethical guidelines put forth by the institutions involved in the research project.

Reference populations

A total of 15 reference populations were used for comparison in this study, each providing data for the 15 STR loci under scrutiny. The geographical locations of all collections involved in the project are illustrated in Fig. 1. The reference populations, abbreviations of collections, linguistic affiliations and number of alleles per populations are provided in Table 1.

Fig. 1
figure 1

Geographic locations of populations under study and previously published collections

Table 1 Populations analyzed

DNA amplification and STR genotyping

PCR amplification was performed using the Ampf/STR Identifiler kit (Applied Biosystems 2001, Foster City, CA, USA) for 15 autosomal STR loci (D8S1179, D21S11, D7S820, CSF1PO, D3S1358, THO1, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818, FGA) in a GeneAmp 9600 thermocycler (Applied Biosystems). PCR protocols and cycling conditions were followed as specified by the manufacturer (Applied Biosystems). DNA fragments were separated through multi-capillary electrophoresis in an ABI Prism 3100 Genetic Analyzer (Applied Biosystems) following the addition of formamide and GeneScan 500LIZ internal size standard to each sample. Genotyping was performed by comparing amplicons to the allelic ladder and internal size standard using the GeneScan 3.7 and Genotyper 3.7 NT software for the Madagascar collection while Genemapper 3.2 was utilized for the Tongan group.

Data analysis

Allelic frequencies were determined utilizing the GenePop web based program, version 3.4 (Raymond and Rousset 1995). The PowerStats 1.2 Software (Jones 1972; Brenner and Morris 1990; Tereba 1999) was used to calculate several parameters of population genetics interest including Matching Probability (MP), Power of Discrimination (PD), Polymorphic Information Content (PIC), Power of Exclusion (PE) and Typical Paternity Index (TPI). These indexes were calculated in order to assess the ability of the STR loci typed to discriminate between individuals and to appraise variability within specific loci.

Observed and expected heterozygosities (Ho and He, respectively) were generated with the aid of the Arlequin software package, version 2.000 (Levene 1949; Guo and Thompson 1992; Schneider et al. 2000) to ascertain departures from Hardy–Weinberg equilibrium (HWE) expectations and heterozygote deficiencies. Statistical significance was assessed before and after applying the Bonferroni correction (α = 0.05/15 = 0.0033 for 15 loci).

Ancestry-informative markers were identified based on average Fst distances as described by Collins-Schraam et al. (2002, 2003, 2004) in order to determine whether the markers included in our analysis provide tangible data on the descent of Austronesian populations and to delineate STR loci especially robust for discriminating among Austronesian peoples. Fst distances were estimated using the program Arlequin, version 2.000 (Weir and Cockerman 1984). Significance was assessed at α = 0.05.

A correspondence analysis (CA) was performed utilizing the NTSYSpc 2.02i software (Rohlf 2002) and a Maximum Likelihood (ML) tree, based on Fst distances (Reynolds et al. 1983), was constructed with the software PHYLIP 3.52c (Felsenstein 2002) in order to deduce phylogenetic relationships between the populations under analysis. Bootstrap analysis involved 1,000 replications.

The DISPAN program (Ota 1993) was employed to estimate inter, intra and total population genetic variance components (Gst, Hs and Ht, respectively). For this purpose, populations were partitioned into five groups:

  1. 1

    Austronesian-speaking (Ami, Atayal, Bali, Java, Madagascar, Malaysia, Philippines, Samoa and Tonga);

  2. 2

    Austronesian-speaking excluding Madagascar (all other populations included in the previous group);

  3. 3

    Melanesians (Australian aborigines, East Timorese residing in Australia, East Timor);

  4. 4

    Niger-Congo-speaking (Angola, Equatorial Guinea, Hutu, Kenya, Mozambique, South Africa, Tutsi); and

  5. 5

    All populations (including all populations encompassed by the first, third and fourth groups).

The Carmody program’s G test (Carmody 1990), employing the Bonferroni adjustment (α = 0.05/146 = 0.000342) to minimize type I errors, was conducted to detect any statistically significant genetic differences between populations. P values at or below α are presumed to indicate heterogeneity between population pairs whereas values above α suggest that the two do not differ significantly from each other and are thus genetically homogeneous.

Admixture tests were conducted in order to ascertain the genetic contribution of source populations to descendant populations using the SPSS 14.0 statistical software package (Long et al. 1991; Perez-Miranda et al. 2006). In these estimations, it is assumed that the loci studied are selectively neutral and the extant collections examined large enough to mitigate the potential impact of bias sampling. Admixture proportions reveal the genetic contributions of groups of populations to the gene pool of the hybrid collection (population suspected of representing a genetic collage composed of differing sources). Yet, they may also reflect shared ancestry rather than direct geneflow between parent and hybrid populations given that in the process of elucidating relationships, allelic frequencies and distributions are employed as bases for comparison. In other words, gene flow from a source population to both hybrid and parental groups instead of a direct relationship between the latter two are possible. In addition, the populations that are selected as parentals may potentially affect the contribution proportions, especially if they are closely related. Barnholtz-Sloan et al. (2005) have indicated that STR loci can provide useful admixture information; however, exact proportions are to be taken cautiously.

For Madagascar, the parents consisted of grouped populations based on biogeographical location. Two groups, Africans (Angola, Equatorial Guinea, Rwanda Hutu, Kenya, Mozambique, South Africa and Rwanda Tutsi) and Southeast Asians (Ami, Atayal, Bali, Java, Malaysia and Philippines), were used as parents in the first analysis. The second admixture assessment employed sub-groups of the previous determination: Taiwanese Aborigines (Ami and Atayal), Indonesian (Java and Bali), Malayo-Filipino (Malaysia and the Philippines), West Africa (Angola and Equatorial Guinea) and East Africa (Rwanda Hutu, Kenya, Mozambique, South Africa and Rwanda Tutsi).

Another set of admixture tests was performed using Samoa and Tonga individually as hybrid populations. These two populations were compared against Southeast Asians and Melanesians as well as to subsets of these assemblages: Taiwanese Aborigines (Ami and Atayal), Indonesian (Java and Bali) and Malayo-Filipino (Malaysia and Philippines).

Results

Intra-population diversity

Allelic distributions for Madagascar and Tonga are listed in Tables 2 and 3, respectively, along with observed and expected heterozygosities (Ho and He, respectively), HWE P values and several important population genetics indexes including MP, PD, PIC, PE and TPI. The Madagascar collection exhibits a substantially higher number of alleles than that of Tonga (129 vs. 115, respectively) a difference expected due to its proximity and possible gene flow from continental sub-Saharan Africa. Although it is not the purpose of this paper to offer a detailed account of allelic frequencies, the presence of alleles 26 and 33.1 of D21S11 and 16.1 and 31 of FGA in Madagascar and their absence from other Austronesian populations is noteworthy. This may reflect gene flow from the highly diverse African mainland where they have been previously reported (e.g., D21S11 26 and 33.1 in Angola and Equatorial Guinea).

Table 2 Madagascar allelic frequencies (n = 67)
Table 3 Tonga allelic frequencies (n = 51)

Three loci (D8S1179, vWA and D5S818) in the Madagascar population and one locus (D2S1338) in Tonga depart from HWE predictions at α = 0.05 (Tables 2, 3). Yet, after applying the Bonferroni correction (α = 0.0033), no loci diverge from HWE expectations.

Relevant population genetic parameters including Combined Matching Probability (CMP), Combined Power of Discrimination (CPD), Combined Power of Exclusion (CPE) and Average Heterozygosities are provided in Supplementary Table 1. Intra-population variances (Hs) are presented in Table 4. Of all four categories, the Austronesian group [Indonesian (Java and Bali), Madagascar, Malaysia, the Philippines, Samoa, Taiwanese Aborigines (Atayal and Ami) and Tonga] possesses the lowest overall intra-population variance (Hs = 0.77324 in Table 4), while the Niger–Congo speaking populations [Kenya, Rwanda (Hutu and Tutsi), Mozambique, Equatorial Guinea, South Africans and Angola] display the highest (Hs = 0.79475) even when compared to the all populations group (Hs = 0.78320).

Table 4 Inter-population and intra-population genetic variance

Inter-population diversity

In order to assess the phylogenetic relationships among all populations, G tests and CA and ML analysis were performed. In addition, Gst values were generated to ascertain inter-population variance. Potential parental contributions to hybrid populations were determined by admixture analyses.

Within the CA (Fig. 2), three clearly defined clusters are apparent: Southeast Asian, African and Polynesian (Samoa and Tonga). Madagascar, although positioned closer to the African cluster, clearly strays from the latter in the direction of the Southeast Asian group. Within the Southeast Asian cluster, the Taiwanese aborigines (Ami and Atayal) display a considerable degree of genetic separation from each other with the Atayal partitioning away into the upper left quadrant, in spite of their geographic vicinity and sharing an extensive common border. The East Timorese populations segregate at an intermediate point from the Southeast Asian assemblage and the Australian aborigines population while the collections from Samoa and Tonga are found close to each other in the upper left quadrant distant from all other groupings. It is notable that the Melanesian Australian and East Timorese partition most distant from the Polynesian collections along the Y axis than any other group of populations, arguing for genetic differences between the two. The African cluster exhibits a tight grouping of populations. Altogether, the CA mirrors known biogeographical demarcations.

Fig. 2
figure 2

Correspondence analysis of Austronesian and African populations

Both the African and Southeast Asian clades are well delineated in the ML dendrogram (Fig. 3) which corroborates the phylogenetic relationships portrayed by the CA. However, in contrast to the isolated positions of Samoa and Tonga in the CA, these two Pacific Austronesian populations cluster close to the Australian/East Timorese collections found adjacent to the Southeast Asian groups in the ML tree. Madagascar occupies an intermediate position, between the African clade and the Polynesian populations.

Fig. 3
figure 3

Maximum Likelihood phylogram of Austronesian and African populations

Inter and total variance components (Gst and Ht, respectively) are reported in Table 4. Inter-population variance is considerably higher among the Austronesian-speaking populations (Gst = 0.03058) when compared to the Niger–Congo-speaking collections (Gst = 0.00833). Excluding Madagascar from the Austronesian group yields a mere 1.4% decrease in the Gst value, suggesting that this high inter-population diversity is a characteristic of Austronesian populations as a whole rather than attributable to the geographical outlier, Madagascar. The total variance, however, is highest amongst the Niger–Congo-speaking populations (Ht = 0.80131), corroborating the high genetic diversity of sub-Saharan African groups.

All pair-wise population comparisons except for Samoa/Tonga, Kenya/Angola and Kenya/Equatorial Guinea revealed statistically significant genetic differences as ascertained by G tests (Supplementary Table 2). The application of the Bonferroni correction for type I errors rendered the differences between Java/Malaysia, Mozambique/Kenya, Mozambique/South Africa, Hutu/Kenya and Angola/Equatorial Guinea insignificant also.

Admixture analyses performed to assess the genetic contributions of groups of populations to the Madagascar collection are presented in Table 5. The results indicate that the African input to the Malgache autosomal gene pool is 66.1% while the Southeast Asians contribute 33.9%. Analyses employing subgroups reveal that the Taiwanese aborigines and the Malayo-Filipino assemblages contribute 17.0 and 17.8%, respectively, of the Malagasy’s autosomal component while the East African group is shown to be the major contributor to the island (46.5%). Interestingly, no input from the Indonesian groups (Bali and Java) was detected through the analysis.

Table 5 Admixture analysis of Madagascar

It is notable that Samoa and Tonga’s gene pools derive from the same genetic sources as the Malgache within Southeast Asia (Tables 5 and 6). The Taiwanese aborigines provide 3.1% of the Samoan autosomal component and 10.6% of the Tongan collection. The Malayo-Filipino group, in turn, contributes 76.5 and 56.9% to Samoa and Tonga, respectively. Altogether, Samoa derives 75.8% of its gene pool from Southeast Asian groups and only 24.2% from Melanesian populations. Similarly, Southeast Asian contributions to Tonga are 64.6% while Melanesian influences only impact 35.4% of its autosomal component. The higher contribution by the Melanesians to Tonga as compared to Samoa may reflect the greater geographical proximity of the former to Melanesia.

Table 6 Admixture analysis of Samoa and Tonga

In the process of assessing genetic relationships between Madagascar, Tonga and Samoa with Southeast Asian and African populations (in the case of Madagascar only), a series of ancestry-informative markers (AIMs) were noted. Locus FGA seems to be especially useful for identifying groups of Polynesian descent while D7S820, D5S818, D18S51, TPOX, D19S433, vWA, D2S1338, D13S317, TH01, D21S11 and D8S1179 are informative in elucidating African ancestry (for a complete list of AIMs see Supplementary Tables 3 and 4). The marker D16S539 may be used for ascertaining Atayalic descent; however, a more detailed analysis using other Taiwanese aboriginal tribes must be conducted in order to reach a consensus.

Discussion

The origins, source populations, migratory routes, and genetic relationships between Madagascar, Samoa and Tonga, and other Austronesian-speaking peoples remain unclear. Several theories have been postulated to explain Austronesian dispersal to the Pacific and Indian Oceans; however, a dichotomy presented by the data available suggests genetic influences and interactions that are highly complex. Altogether, Y-chromosomal studies indicate greater contributions from non-Austronesian versus Austronesian groups to both Madagascar and the Polynesian populations, while the opposite has been observed for those involving mtDNA (Melton et al. 1995; Lum 1998; Hagelberg et al. 1999; Kayser et al. 2000; Hurles et al. 2002; Hurles et al. 2005; Trejaut et al. 2005; Kayser et al. 2006). With Formosa as a potential origin of the Austronesian expansion (Bellwood 1990; Ruhlen 1994), a high-resolution analysis of autosomal STR markers was conducted to ascertain the phylogenetic relationships between Southeast Asian populations and groups at the eastern (Samoa and Tonga) and western (Madagascar) boundaries of the Austronesian diaspora. The bi-parental inheritance and genome-wide distribution of these hypervariable loci allow for an unbiased comprehensive assessment of phylogenetic relationships among populations. A second aim of the study is to assess which of the Southeast Asian and African populations have had the most impact on the Malagasy gene pool.

Austronesian populations most likely experienced a series of genetic bottleneck events as the migrants traveled from island to island in Southeast Asia and the Pacific Ocean during the expansion (Melton et al. 1995; Redd et al. 1995; Sykes et al. 1995; Lum et al. 1998; Richards et al. 1998; Kayser et al. 2000; Su et al. 2000; Capelli et al. 2001; Oppenheimer and Richards 2001a, b; Lum et al. 2002). Bottleneck events are reflected in the limited number of total allelic types in Polynesian (mean 116) and Taiwanese populations (mean 100) compared to African collections (mean 152 alleles).

Within the Austronesian collections, heterozygosity values range from 0.7269 in the Atayal from Taiwan (Shepard et al. 2005) to 0.8035 in the Malgache. On average, the Austronesian speaking groups possess 121 allelic types whereas collections from Asia [China (Hu et al. 2005; Wang et al. 2005), Japan (Hashiyada et al. 2003 and Korea Kim et al. 2003)] and Africa (all populations from Africa) average 153 and 152 allelic types, respectively. Only 79.1 and 79.6% of the total number of alleles found in Asian and African collections, respectively, are present in Austronesian populations. On the other hand, mean heterozygosity values are comparable between Austronesian (0.7763) and Asian (0.7755) groups while Niger–Congo speakers average 0.7995. The lower genetic variability of Austronesian collections is also reflected in the lower intra-population variance components of Austronesian speaking peoples (Hs = 0.77324) in comparison to that of the Niger–Congo speakers (Hs = 0.79514), a trend which is expected due to the widespread diversity commonly found throughout sub-Saharan Africa (Table 4). The Malgache seem to be the exemption to the rule within the Austronesians, given that their average heterozygosity is comparable to that of the African groups (Shepard and Herrera 2006). It is likely that the reduced heterogeneity observed in Austronesian groups compared to sub-Saharan African populations developed as a result of allelic drop-outs in serial bottleneck events during island hopping, while the greater number of alleles in Madagascar and continental African populations reflect their well established high level of diversity. These findings not only mirror previous Centroid analysis results by Chow et al. (2005) indicating a high degree of gene flow into the East African island, but lend support to the belief that different source populations have contributed to the Madagascar gene pool (Hurles et al. 2005).

The inter-population variability among the Austronesians (Gst = 0.03058) is only slightly lower than that of the all-populations group (Gst = 0.03342) and substantially higher than that of the Niger–Congo speaking populations (Gst = 0.00833) and Melanesians (Gst = 0.01309). The relatively high inter-population diversity value among the Austronesians is most likely related to the genetic differences generated by genetic drift emanating from bottleneck episodes during their diaspora. The relatively low Gst values among the Melanesians when compared to the Austronesians may suggest that the former have not been subject to recent evolutionary processes capable of partitioning them genetically. These results parallel the widespread heterogeneity found between the Austronesian populations observed in the G test in which only the geographically proximal populations (Samoa/Tonga, and Malaysia/Java after applying the Bonferroni correction) do not differ significantly whereas several geographically distant pairs of populations from Africa yield insignificant differences (Supplementary Table 2) supporting previous reports of genetic homogeneity throughout the area (Underhill et al. 2001).

The Madagascar collection significantly differs from all populations in the G test, echoing the results of the CA plot and ML dendogram where it does not conform to any one cluster. Admixture analysis results reveal contributions from both African (66.1%) and Southeast Asian populations (33.9%), supporting previous studies utilizing the Rh factor (Singer et al. 1957), mtDNA and the Y-chromosome (Hurles et al. 2005) signaling contributions from both regions. Our results indicate that the main contributors to the Malgache gene pool are the East African groups (46.5%), although a clear input from the West African populations (18.7%) can also be discerned. This is expected considering the geographic vicinity of insular Madagascar to continental East Africa. It is possible that the West African component results from the genetic imprint left by the Bantu expansion throughout Southeast Africa. Underhill et al. (2000) have reported that the Y-chromosomal marker E3a (M2 mutation) and its subclades, largely present within the African genetic landscape, are directly linked to the spread of Bantu farmers from West Africa. Furthermore, both Underhill et al. (2000) and Beleza et al. (2005) have found that the expansion led to a genetic displacement of older native Y-chromosomes and to a decrease in the lineage’s diversity throughout the area. In addition, based on mtDNA data, Plaza et al. (2004) have found evidence of continued interactions between West Africa (specifically Angola) and Southeast Africa. It is important to note then, that the similarities between West Africa and Madagascar are likely due to the genetic history of mainland African populations rather than direct gene flow into the island from West African groups.

Singer et al. (1957) identified Indonesia as the main Austronesian contributor to the Malgache. In contrast, the high-resolution, biparental genetic markers and array of informative populations of the present study suggest that the Austronesian source populations of Madagascar are the Malayo-Filipino group (17.8%) and the Taiwanese aborigines (17.0%), and not the Indonesian populations from Java and Bali. These results support Y-chromosomal data by Su et al. (2000) and Hurles et al. (2005) confirming the presence of Southeast Asian Y-chromosomes in the island.

Similar to Madagascar, the Polynesian groups in this study exhibit genetic inputs from the Taiwanese Aborigines (10.6% for Tonga and 3.1% for Samoa). Interestingly, Scheinfeldt et al. (2006) have suggested that the Ami are the ancestors of all Austronesians outside Taiwan since the Y-chromosome O3a (M122) lineage found within Polynesia is represented considerably in the Ami and only at low frequencies in other Formosan aboriginal groups. Nevertheless, O3a (M122) is found in other Taiwanese aboriginal tribes making it difficult to deduce their potential contribution (Scheinfeldt et al. 2006). Resolution of the issue concerning the Taiwanese aboriginal source population to Austronesians outside Taiwan awaits systematic work involving all the Formosan tribes utilizing various types of marker systems.

The Malayo-Filipino assemblage appears to be the primary autosomal genetic contributor to both Samoa (76.5%) and Tonga (56.9%) and may signal an ancestor-descendant relationship between Malaysia and Polynesia. Malaysia and the Philippines may represent one of many stages of a migration originally from mainland or insular Southeast Asia. As with Madagascar, no Indonesian autosomal signal is detected in Samoa and Tonga (Tables 5 and 6). The absence of an Indonesian component may be indicative of an Austronesian bypass of this region during the spread, or the elimination of Austronesian DNA resulting from admixture or displacement by native and/or subsequently invading populations and/or genetic drift.

Previous studies have found a genetic separation between Samoa and other Austronesian groups (Parra et al. 1999; Shepard et al. 2005), a finding also observed in the present study for both Samoa and Tonga. Although genetically distinct from other Austronesian peoples, these two Pacific populations lie closest to the Austronesian cluster in the CA along axis 2 (Fig. 2), supporting previous mtDNA and linguistic studies suggesting that phylogenetic relationships among Austronesians are genetic in nature and not merely the product of language replacement in genetically autonomous groups (Trejaut et al. 2005). Along axis 1 of the plot, the populations are most closely related to the Australian aborigines and East Timorese, supporting previous findings by Kayser et al. (2006) based on mtDNA and Y-chromosomal data which indicate that Polynesian populations represent a composite of Southeast Asian and Melanesian lineages. Admixture analysis results advocate these statements and reveal that 75.8% of the Samoan autosomal component and 64.6% of Tonga’s are of Southeast Asian origin while the remaining (24.2 and 35.4%, respectively) are of Melanesian descent. Altogether, the data strengthen the claims of the “slow boat” hypothesis, proposed by Kayser and colleagues (2000), postulating that Austronesian dispersal to Polynesia occurred slowly allowing for the assimilation of the Melanesian genetic matrix along its course.

The dichotomy in the data attained from previous Y-chromosome and mtDNA reports does not allow a clear panorama as to the origin(s) and migrational patterns of the Austronesian expansion. In the present study, we employ a battery of STR hypervariable genetic markers to discern the representative autosomal diversity (instead of the maternally and paternally restricted lineages) and phylogenetic relationships of Austronesian-speaking groups from Madagascar as well as Tonga and Samoa in Polynesia with geographically targeted reference populations from Southeast Asia and Africa. The data indicate that the Malgache gene pool derives 66.3% of its genetic makeup from the African mainland while still retaining some of its Southeast Asian roots (33.7%). Similarly, while the Samoan and Tongan collections possess differing degrees of Melanesian influence (24.2 and 35.4%, respectively) they still exhibit a considerable contribution from insular Southeast Asia (75.8 and 64.6%, respectively). Furthermore, according to admixture proportions, the Taiwanese aborigines have contributed genetically to the collections of Samoa, Tonga, and Madagascar whereas the Indonesian groups from Bali and Java have not. These results may be indicative of an expansion route which may have originated in Formosa, dispersed southward by way of the Philippines and Malaysia, and then bifurcated into eastward (toward Micronesia/Polynesia in the Pacific) and westward (eventually reaching Madagascar by way of the Indian Ocean) trajectories. Altogether, the data support the contention that Austronesian populations share genetic components that bind them together beyond the effects of genetic drift resulting from serial bottleneck episodes as limited number of individuals migrated large geographic distances across vast oceanic expanses.