A perennial question in Oceanian history concerns the possibility of prehistoric contacts between Polynesian and Native American populations. Previous researchers investigating this question through genetics have focused on Easter Island (Rapa Nui). As the closest inhabited Polynesian island to the Americas, and the Polynesian island with the most elaborate megalithic culture16, Rapa Nui has been considered a likely locus for contact. High-resolution analyses of human leukocyte antigen (HLA) alleles have revealed a Native American component in modern individuals with self-identified Rapanui ancestry8,9. However, in the only two genome-wide studies of Rapanui variation—one of eight modern individuals10, and one of five skeletal remains (three from pre-European contact era and two from post-European contact)11—a Native American component was found in all samples of the former, but none of the latter. As a consequence, these studies reached opposing conclusions about pre-European contact between Polynesian individuals on Rapa Nui and Native American individuals10,11. To date no genome-wide DNA studies have considered the possibility of pre-European Native American contact on other Polynesian islands. Here we investigate both of these questions through high-density genome-wide analyses of a large data set of 166 Rapanui and 188 additional individuals from islands spanning the Pacific (Fig. 1a and Supplementary Tables 1, 2).

Fig. 1: Sampled populations with unsupervised (iterative) ADMIXTURE analysis.
figure 1

a, Map, including prevailing currents and winds, showing the location of each sampled population with dots coloured by region (see key at top). b, K = 6 clustering analysis of Pacific islanders and reference populations using the ADMIXTURE method17. The reference panels include populations from: Europe (UK and Spain) and Africa (Yoruba)43, the Americas (Mapuche, including Pehuenche and Huilliche, from central and south Chile44; Aymara from southern Peru and northern Chile; northern Peruvian individuals from Magdalena de Cao; Zenu from Colombia; and Zapotec and Mixe from southern Mexico45), and at the far-left Melanesian individuals from Vanuatu (see Supplementary Fig. 5). Each individual is represented as a narrow column, coloured to show the proportion of each ancestry cluster in that individual. We included modern Colombian and Ecuadorian individuals46 as well as four ancient (pre-European contact) individuals (italics, wide columns), spaced along the coast (small dots), to further illustrate the Native American component44,47,48, but did not use these populations/individuals as reference panels owing to their lower marker density. See Supplementary Table 5 for the distinction between the nomenclature for early modern and ancient Oceanian clusters. NE, northeast; SE, southeast.

Multiple admixture events in Polynesia

We first performed a global ancestry analysis of our Polynesian and coastal Native American samples together with continental reference populations using the ADMIXTURE algorithm17 (Fig. 1b, Supplementary Figs. 18, Supplementary Tables 35 and Supplementary Discussion) and principal component analysis (PCA; Supplementary Fig. 9). We followed these variant-frequency-based analyses with an independent, sequence-matching-based analysis (local ancestry inference18) in order to identify precise genomic regions of Polynesian, Native American, European and African origin in each individual for use in later ancestry-specific analyses (Fig. 2a and Supplementary Table 6).

Fig. 2: Relationship between Polynesian, Native American and European ancestries.
figure 2

a, Random-forest-based local ancestry inference of a Rapanui individual, showing small (old) Native American ancestry tracts embedded in Polynesian ancestry tracts. The ancestry of each haploid genome is coloured (top and bottom for each homologous chromosome of a pair); the autosome pairs are numbered along the vertical axis and genetic position in centimorgans (cM) is on the horizontal axis. b, Ternary plot of ADMIXTURE ancestry fractions in Rapanui individuals having Polynesian, European and central Native American, but no other, ancestries (each point corresponds to an individual). The first principal component in the centred log-ratio transform space49 is projected onto the figure as a dashed curve. The ancestries’ log-ratio variances are discussed in Supplementary Tables 710. c, d, Length-distribution analyses for ancestry tracts in the six Rapanui individuals with no European ancestry (c) and in North Marquesan individuals (d). Plotted points show the aggregate tract length counts; lines show the maximum-likelihood best-fit tract length distributions; and shading shows one standard deviation confidence intervals, assuming Gaussian noise. The best-fit admixture chronology is plotted above the timeline as a line history, with each colour representing an ancestry, as indicated in the key, and with generation times given from the admixture points (black) to the present.

In all of these ancestry analyses, the Pacific island populations are characterized by a large Polynesian component, but with many islanders also having a European component from colonial admixture. Remarkably, in both of these independent analyses, as well as in f4 and D-statistic analyses (P < 0.001), we also detect admixture in eastern Polynesia from Native American individuals, even when using pre-European-contact Native American reference panels (Supplementary Figs. 10, 11). Looking at the ADMIXTURE plot, in the easternmost Polynesian islands (Palliser, Marquesas, Mangareva and Rapa Nui), but on no other Polynesian islands, two Native American ancestry components can be seen. These components are characteristic of central (green in the figures) and southern (yellow) Native American populations (both modern and ancient). The southern Native American component—highest in the Mapuche and Pehuenche native peoples of Chile—increases in present-day Rapanui individuals in proportion to their European (red) ancestry component (Fig. 1b, Supplementary Fig. 12 and Supplementary Tables 7, 8). This is consistent with the idea that the Native American component arrived on Rapa Nui together with a Spanish European component via immigration of admixed Chilean individuals following Chile’s annexation of the island. By contrast, the central Native American component, characteristic of indigenous Mexican individuals (Mixe and Zapotec) and indigenous Colombian individuals (Zenu), is associated on Rapa Nui only with the Polynesian component, not with the European or southern Native American components, according to the log-ratio variances of those components (Fig. 2b and Supplementary Tables 710). This suggests that the central Native American component arrived onto Rapa Nui independently from the European component. Furthermore, in contrast to the southern Native American component (Chilean), the central Native American component varies little between Rapanui individuals, indicating that it stems from an older admixture event19,20. Indeed, the Native American DNA segments in Rapanui individuals have an aggregate length distribution that indicates initial contact several centuries before European individuals entered the Pacific (Fig. 2c). Intriguingly, the central Native American component (green) is found in the other remote eastern Polynesian islands (Palliser, Marquesas and Mangareva) and has a similarly early date (Fig. 2d).

When we investigate the European ancestry in Polynesian individuals, we see correspondences with the European nations that colonized each island. For example, the European component in French Polynesian individuals clusters with French reference panels in our new ancestry-specific multidimensional scaling (MDS) analysis (Fig. 3a). Those Rapanui individuals with southern Native American ancestry in the ADMIXTURE analysis are shifted towards the Spanish reference panels in this European-specific MDS analysis, consistent with those two ancestries arriving together via immigration of admixed Chilean individuals. The remaining Rapanui individuals that have European ancestry, but no southern Native American ancestry, cluster largely with French reference panels, which is consistent with the French origin of the first European residents on Rapa Nui21. We also analyse long, shared DNA segments (larger than 7 centimorgans) that are inherited by related individuals and are termed ‘identical by descent’ (IBD). Analysing genomic regions of only European ancestry, we find that the IBD relationship network mirrors European settlement patterns, with many French Polynesian islands forming one connected component, separate from Rapa Nui (Fig. 3b and Supplementary Table 11). Rapa Nui’s single European connection to Mangareva may reflect the transfer21 of French Catholic missionaries from Rapa Nui to Mangareva in ad 1871.

Fig. 3: Analysis of European ancestry in Pacific islanders.
figure 3

a, Our new ancestry-specific MDS applied to the European ancestry of each admixed sample from the Pacific islands, together with European reference individuals from the Population Reference Sample (POPRES) data set50, shows French Polynesian islanders (labelled within the plot) clustering with French individuals. In addition, Rapanui individuals (diamonds) cluster with Spain or France, depending on whether (green) or not (violet) they also have southern Native American (Chilean) ancestry. The numbers of samples from each country are given in Supplementary Table 3. b, IBD sharing of European ancestry segments in Polynesia is strongest (darker and thicker lines indicate a higher probability of sharing; see key at bottom) between island clusters with the same European colonial backgrounds. The islands’ sample sizes are given in Supplementary Table 1.

Native American ancestry in Polynesia

The Native American ancestry in eastern Polynesian individuals shows a very different pattern of interisland IBD sharing, indicating a different history of Native American contact (Fig. 4a, Supplementary Fig. 13 and Supplementary Table 12). To characterize the origin of this ancestry in Polynesia more precisely, we applied a new ancestry-specific PCA to the Native American component (see Methods and Supplementary Figs. 14, 15). The first principal component is found to order the Native American reference individuals along a north–south axis, coinciding roughly with the Pacific coast of the Americas (Supplementary Fig. 14). We plot the density of the Native American reference individuals along this first principal component axis together with the location of the aggregate Native American components for each of the eastern Polynesian islands possessing such ancestry (Fig. 4b). Consistent with our ADMIXTURE analysis, which showed a central Native American component in Pacific islanders, in this analysis the Native American ancestries of the Pacific islanders all fall within, or beside, the Zenu people—an indigenous Colombian population. The localization of the Native American component to Colombia–Ecuador is shown clearly by our new, lower-noise, ancestry-specific MDS analysis, as well as by PCA, and is consistent with the less-sensitive traditional Procrustes analysis and outgroup-f3 statistic (Supplementary Figs. 1622). The only exceptions are the Rapanui individuals with high European ancestry. As expected, their Native American component, which probably came together with their European component through immigration of admixed Chilean individuals to Rapa Nui, is located squarely within the Pehuenche and Mapuche native populations of central Chile (Fig. 4b and Supplementary Figs. 14, 16, 1822). The Native American ancestry component in Rapanui individuals with no European ancestry, by contrast, clusters with the Colombian Zenu people, just as with the other eastern islands.

Fig. 4: Origin and spread of early Native American ancestry in Polynesia.
figure 4

a, Results of a Native American specific IBD analysis reflect the common ancestry and origin of the Native American component in easternmost Polynesia. EA, European ancestry.b, Our new ancestry-specific PCA (left) separates Pacific Rim Native American reference individuals along a north–south axis, as shown in this kernel density plot of the numbers of individuals from select reference populations along the first principal component (PC1) axis. See Supplementary Figs. 14, 18 for the full two-dimensional plot. Colours indicate the reference populations’ locations in the Americas (right). The locations of the aggregate Native American specific components for each Pacific island are also plotted (black dots connected by dashed lines to their source island in a). The maximum likelihood date for the Native American introgression event in each island population, as determined by a Tracts analysis (Supplementary Fig. 23), is displayed under the corresponding dashed line. The numbers of samples used from each island and each American population are given in Supplementary Table 1.

Apart from the Chilean annexation of Rapa Nui inad 1888 and sporadic interactions with ships’ crews, the only recorded events potentially connecting Pacific islanders with Native American ancestry are the Peruvian slave raids of ad 1862–1863. During this year, thousands of Pacific islanders were kidnapped and taken to Peru as forced labourers, including 1,407 Rapanui individuals22. Following an international outcry, a few repatriation voyages were organized, but smallpox outbreaks onboard meant that only a handful of passengers made it back to Polynesia alive. Only two of the islands in our data set received any recorded returnees: Rapa Nui (15 repatriated) and Rapa Iti (9 captive individuals from other islands resettled). With very few individuals, all self-identifying as islanders, returning to Polynesia, and with their captivity in Peru lasting only a few months, it is unlikely that this episode resulted in any introgression of Native American ancestry into Polynesia. However, such explanations have been advanced11,23. In any case, the Native American component that we observe in the easternmost islanders, including on distant islands untouched by returnee voyages, derives from an indigenous American population lying to the north of both of our Peruvian Native American reference individuals, namely the southern Peruvian Aymara and the northern Peruvian Magdalena (Fig. 4b and Supplementary Figs. 16, 18).

Our localization of the Native American ancestry found in Polynesia is consistent with several linguistic, historical and geographical observations that support an origin in northern South America. Although superficial similarities between the monolithic statues of the Pacific islands (found only in the remote eastern Polynesian islands) and those of the pre-Columbian site of San Augustín, Colombia, have long been noted2, stronger evidence has come from the Polynesian word for the sweet potato, ‘kumala’. This word has been linked to names for the food in northern South America, where it originated2,3,24. The coastal languages that use these related names lie to the north of Peru—for example, ‘cumal’ is used by the Cañari people of Ecuador25—whereas the Peruvian languages that use such names are Andean and located far from the coast. It is to the north of Peru that the Pacific coast changes from desert to forests suitable for boat construction, and it is from Pacific Ecuador and Colombia that Native American voyagers are believed26,27,28,29 to have embarked for trade with Mesoamerica in large ocean-going sailing rafts made of balsa wood during the period ad 600 to ad 1200. Wind and current simulations from the Pacific coast of the Americas have shown that drift voyages departing from Ecuador and Colombia are the most likely to reach Polynesia, and that they arrive with the highest probability in the South Marquesas islands, followed by the Tuamotu Archipelago4. Both of these archipelagos lie at the heart of the region of islands where we have found a Colombian Native American component. The trade winds and the south equatorial current move east to west at these latitudes, funnelling boaters from northern South America to the archipelagos26 (Fig. 1a). (In Thor Heryerdahl’s famous drift voyage from Peru to Polynesia, his Kon-Tiki raft had to be towed 80 miles offshore from Peru, because the southern current along the Peruvian coast was so unfavourable; once in the trans-Pacific currents the Kon-Tiki raft landed in the Tuamotu Archipelago.) For the same reason, these archipelagos would be the most likely origin for Polynesian individuals discovering the Americas using their characteristic upwind exploration30,31.

Dating Native American–Polynesian contact

To determine when the Native American component was introduced into each of the affected Polynesian populations in our data set (Nuku Hiva in the North Marquesas, Fatu Hiva in the South Marquesas, Palliser in the Tuamotu Archipelago, Mangareva, and Rapa Nui), we modelled the length distribution of the Native American, European and Polynesian ancestry segments of the islanders using the Tracts method32 (Supplementary Figs. 23, 24). For all island populations—with one expected exception discussed below—we find that the model with the highest likelihood involves an initial Native American–Polynesian admixture event, followed centuries later by European introgression (Supplementary Table 13). Those later estimated European admixture dates (North Marquesasad 1820, South Marquesas ad 1830, Mangareva ad 1750 and Palliser ad 1790) fall within the period of European colonization of Polynesia. By contrast, the dates estimated for Native American–Polynesian admixture on the islands are much earlier, and they are similar across the different islands (Mangareva ad 1230, Palliser ad 1230, North Marquesas ad 1200 and South Marquesas ad 1150).

The only exception to these consistently early dates is on Rapa Nui itself, where Rapanui individuals with no European (colonial) ancestry have a slightly later estimated Native American introgression date (ad 1380; Supplementary Fig. 23a). However, this inferred date may be shifted later owing to more recent Native American introgression from Chile, as already discussed. Indeed, Rapanui individuals who have high European and Native American ancestry (Supplementary Fig. 23b) show Native American introgression predominantly during the colonial period (best fit, ad 1720). According to the best-fitting model, this represents Native American introgression into European ancestry first, probably occurring in Chile, followed later by addition of Polynesian ancestry (best fit, ad 1860), probably when admixed Chilean individuals began immigrating to Rapa Nui (Fig. 4b). That latter date (ad 1860) is slightly before the annexation of Rapa Nui by Chile (ad 1888); however, by this time 12 Chilean individuals (out of approximately 100 total inhabitants) were already recorded to be living on Rapa Nui33. Indeed, because of its relation to modern Chile, we find that Rapa Nui is one of the most complicated places to study and date the prehistoric Native American contact in eastern Polynesia (see Supplementary Fig. 24).

To confirm our Tracts dating of the early Native American introgression in Polynesia, we used an alternative, linkage-disequilibrium-based, dating method34 (Supplementary Fig. 23g and Supplementary Table 14). Unlike the Tracts approach, this method (ALDER) does not rely on phasing or local ancestry inference, instead fitting the exponential decay of linkage disequilibrium within an admixed target population directly, using two reference populations as proxies for the ancestral sources35. We used data from unadmixed Native American individuals from Peru and indigenous Austronesian individuals from Taiwan as reference populations, and islanders with Native American and Polynesian admixture only (no European ancestry) as targets (six Rapanui, four Mangareva, two Palliser and one North Marquesas). For these pooled individuals, we obtained an estimated admixture date of ad 1234 ± 90 years (Supplementary Fig. 23g). We note that all of our Tracts date estimates are contained within the confidence interval of this ALDER estimate, except for the aforementioned special case of the Rapanui individuals with recent Chilean ancestry.


The Native American component within each of these widely separated remote eastern Polynesian islands has a similar introgression date, a common source in the indigenous peoples of Colombia, and a dense shared IBD network indicating shared ancestors. Each of these results is most parsimoniously explained by a single prehistoric contact event between eastern Polynesian and Native American individuals. Although the island contacted is not yet clear, and perhaps is not found in our present data set, it is likely that the contact occurred during the original period of discovery and settlement of remote eastern Polynesia by Polynesian individuals. Descendants of the initial contact probably transmitted their dual ancestry to new islands upon settling them; interisland trade contact may also have played a part. Thus, the prehistoric Native American component on Rapa Nui, upon which so much research has focused9,10,11, is likely to have originated from a contact event not on Rapa Nui, but somewhere upstream in the Polynesian settlement process. This would explain a human-mediated spread of the sweet potato throughout Polynesia, if, as some have speculated, the Polynesian settlement of Rapa Nui involved no return voyaging or trade links31,36,37,38.

Our earliest estimated date of contact isad 1150 for Fatu Hiva, South Marquesas. This is close to the date estimated by radiocarbon dating for settlement of that island group13, raising the intriguing possibility that, upon their arrival, Polynesian settlers encountered a small, already established, Native American population. It was on the island of Fatu Hiva—the easternmost island in equatorial Polynesia—that Thor Heyerdahl hypothesized that Native American and Polynesian individuals might have contacted one another, based on islanders’ legends stating that their forefathers had come from the east39. The Marquesas lie at the latitude of Ecuador, and wind- and current-based simulations indicate that they are the islands most likely to be reached from South America via the strong east-to-west currents and winds at these equatorial latitudes4,40,41.

We cannot discount an alternative explanation: a group of Polynesian people voyaged to northern South America and returned42 together with some Native American individuals, or with Native American admixture, as speculated in ref. 10. We have dated the contact event to the time when Polynesian explorers were, according to some studies, making their longest-range voyages (the century surroundingad 1200)—a time when these studies suggest that the Polynesian settlers discovered all remaining island groups in the Pacific, from Hawaii to New Zealand to Rapa Nui13,38,42. The Tuamotu Archipelago, which lies at the centre of the Polynesian islands in which we found a Native American component, is known to have been a Polynesian voyaging hub, and according to simulations it is the second most likely location to be reached when voyaging from South America4. Further population genetics collaborations with these genetically understudied island populations are needed to resolve these alternative hypotheses.

In conclusion, we find strong genetic evidence for pre-Columbian human trans-Pacific voyaging contact (at the turn of the twelfth century), contemporaneous with the Polynesian voyages of discovery in the remote eastern Pacific13,14. Previous studies of putative Polynesian–Native American contact have focused on Rapa Nui, whose modern genetic history has been influenced by a recent Chilean admixture event, and have missed the possibility, which we show to be more likely, that prehistoric contact occurred before the settlement of Rapa Nui. We show that evidence for early Native American contact is found on widely separated islands across easternmost Polynesia, including islands not influenced by more recent Native American contact events. Our results show the usefulness of genetic studies of modern populations, which allow for large sample sizes to unravel complex prehistoric questions, and demonstrate the importance of combining anthropological, mathematical and biological approaches to answer these questions.


No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.

Ethical approval

Written informed consent was obtained from all participants, and research/ethics approval and permits were obtained from the following institutions: the Stanford University Institutional Review Board (IRB approval no. 20839), the Oxford University Tropical Research Ethics Committee (reference no. 537-14), and the Ethical Scientific Committee at the Pontificia Universidad Católica de Chile (reference no. 1971092), conducted in accordance with the guidelines of the National Commission on Science and Technology (CONICYT-Chile).

Sample collection and genotyping

This work combines publicly available genotype data and newly generated single-nucleotide polymorphism (SNP) array data from samples collected over different time periods by the participating institutions (Supplementary Tables 13). Sampled populations and genotyping platforms are detailed in Supplementary Table 1. A total of 25 populations were genotyped at the University of California, San Francisco (UCSF) using Affymetrix Axiom LAT-1 arrays. Another seven populations were genotyped using an Illumina multi-ethnic genotyping array (MEGA) or Illumina 610-Quad arrays (see Supplementary Table 1). Genotype calling was performed following default parameters using Affymetrix’s Genotyping Console software or Illumina’s GenomeStudio application, as appropriate. The average call rate was 98.5% for all newly genotyped samples. Before filtering and merging, the total numbers of SNPs called on the Axiom LAT-1 and Illumina MEGA platforms were 813,036 and 1,738,289 respectively. To remove genotyping errors, all samples genotyped on the same array were filtered together using Plink 1.9, eliminating the following: individuals missing more than 1% of genotype sites (mind .01), SNPs missing in more than 1% of individuals (geno .01), and SNPs out of Hardy–Weinberg equilibrium with a P value of less than 1 × 10−110.

Data preparation

We used the UCSC tool liftover to bring all data onto the same genome build, GRCh37 (hg19)51. When merging data from different genotyping arrays, strand flips were detected and corrected, with ambiguous SNPs (SNPs whose strand definition could not be definitively matched between arrays) removed. This typically resulted in a loss of fewer than 10% of SNPs. Hence, the resulting SNP density after merging with different reference panels varied across working data sets for downstream analyses, as detailed throughout the Methods. Genetic positions were assigned using the interpolated recombination map generated by the 1000 Genomes project43. Given the depth and quality heterogeneity of the ancient samples, we called pseudohaploid genotypes for all ancient individuals to minimize potential bias derived from calling diploid genotypes48. For each ancient genome, we discarded reads with mapping quality below 30 and bases with quality below 20.

Global ancestry analysis

Principal component analysis. PCA was performed using EIGENSOFT 7.2.1 (ref. 52) by merging the genotyped Polynesian individuals together with reference panels from Africa, Europe, Taiwan, Melanesia and the Americas (Supplementary Fig. 9). For African and European reference panels, we used genotypes from 1000 Genomes individuals: 60 Yoruban (YRI), 30 British (GBR) and 30 Spanish (IBS) individuals43. For Melanesian reference individuals, we used 16 individuals from Vanuatu; for Taiwan, we used 20 individuals from the Atayal and Paiwan indigenous groups; and for the Americas, we used 60 individuals with only Native American ancestry (as indicated by our ADMIXTURE analysis, Fig. 1b; see below) originating from Puno, Peru (Supplementary Table 1). Merging of the sequence and filtered genotype data (689,899 SNPs) was done with PLINK 1.9 (ref. 53), as was linkage disequilibrium pruning (LD pruning) (–indep-pairwise 50 10 .5), which was used to greedily remove successive variants with a squared correlation greater than 0.5 in 50 SNP sliding windows with 10 SNP steps. Plotting was performed in R version 3.5.2 using the ggplot2 3.1.0 package54.

Unsupervised ADMIXTURE. To explore Native American substructure in our Polynesian individuals, we merged the Pacific island individuals above together with European (10 UK, 10 Spain), African (20 Yoruba) and Pacific coastal Native American reference populations, which included Mapuche (6 Pehuenche, 14 Huilliche), Aymara (10 Puno, 10 Arica), Magdalena de Cao (19), Zenu (19) and Mexico (10 Mixe, 10 Zapotec). Samples from the two latter locations were genotyped on a second array (see Supplementary Tables 1, 3), so the merged data set of 489 individuals had an overlap of 134,281 SNPs. ADMIXTURE 1.3.0 was run on this data set using unsupervised mode17 (Supplementary Figs. 1, 2). According to the elbow55 in the cross-validation plot (Supplementary Fig. 2), a good clustering is found around K = 7. Because our data set is heavily imbalanced, with the bottlenecked Rapa Nui samples (n = 166) comprising nearly as much of our Pacific island data set (47%) as all other islands combined, a cluster corresponding to Rapa Nui related Polynesian ancestry emerges (see Supplementary Fig. 4). This issue was addressed using the iterative ADMIXTURE approach described below.

Iterative unsupervised ADMIXTURE. To avoid the spurious clustering56,57 that can be introduced by imbalanced sampling—such as in our Pacific island data set, which comprises 47% Rapanui individuals (described above; see Supplementary Figs. 1, 2 and Supplementary Discussion)—without having to downsample our Rapanui population, we used a new iterative unsupervised ADMIXTURE approach. Previous studies have addressed such spurious clustering, if properly recognized, by using supervised or semisupervised (projection) approaches58 or by simple downsampling of the overrepresented populations. We found none of those approaches to be fully satisfactory. Supervised learning requires a researcher to subjectively define clusters a priori, which does not allow ancestry patterns to emerge naturally from the data. A semisupervised approach—for example, running unsupervised ADMIXTURE on an evenly sampled data set, followed by projecting the remaining samples onto the clusters found—avoids these subjective biases, but generates noise in the projected samples. This noise manifests itself as small spurious proportions of all ancestries found in the projected samples, and stems from the fact that variants in the projected individuals were not able to inform the original clustering.

We solve both of these problems at once, albeit in a computationally intensive fashion, by using an iterative approach that allows every sample to participate in a fully unsupervised ADMIXTURE run, while ensuring that no one run suffers from a highly imbalanced data set. In particular, we chose evenly sampled reference numbers as is standard for a projection-type analysis (with additional representation from Native American populations, owing to their admixture with other ancestries). These were selected according to the original unsupervised ADMIXTURE (Supplementary Fig. 1) components: African, 20 Yoruban individuals; European, 10 Spanish and 10 British individuals; central Native American, 10 Mixe, 10 Zapotec, 19 Zenu, 20 Aymara, 19 Magdalena individuals; southern Native American, 20 Mapuche individuals (6 Huilliche, 14 Pehuenche); Polynesian, 2 individuals from each island; and Melanesian, 16 Vanuatuan individuals (Supplementary Fig. 5). Within the reference populations, those samples without recent admixture (typically less than 10%) according to our autosomal haplotype-based local ancestry analyses (see below) were chosen. This further eliminated imbalances in the ancestry cluster sizes represented by the reference panels. We then iteratively ran the reference panels together with each of the remaining Polynesian samples in a series of separate fully unsupervised ADMIXTURE analyses until all samples had been analysed. Each individual run was a standard downsampled unsupervised ADMIXTURE analysis. By repeating many such runs, all of the overrepresented Rapanui samples could be analysed, providing sufficient samples for our later compositional ancestry analyses (see below). The results for all individuals were then plotted59 using ggplot2 3.1.0 and Pophelper 2.2.9.

Because our admixed Colombian and Ecuadorian reference panels were genotyped on a third array (Illumina 610-Quad), different from both of the two merged above (Affymetrix Axiom LAT-1 and Illumina MEGA), combining them with our panel would have resulted in further loss of common SNP markers giving an even lower resolution for rare ancestry components. Thus, when these samples were run iteratively (separately), they were run in their own lower-SNP-density (32,872 SNPs) three-way array merge with the reference panels. Owing to the lower density of SNPs, slightly more noise is evident in the ancestry assignments of these samples as compared with the higher-density, neighbouring American samples (see Fig. 1b). The same strategy was used for each of our ancient Native American samples. Each ancient sample was merged separately with the reference panels to maximize the SNP overlap (48,666 SNPs for the La Galgada sample; 114,927 SNPs for the Aconcagua sample; 25,429 SNPs for the best Saki Tzul sample; and 129,612 SNPs for the best Ancestral Kaweskar sample), and then unsupervised ADMIXTURE was run on each merge. Because the ancient sample genotypes were called pseudohaploid, the reference panel individuals were also treated as pseudohaploid for consistency in each of these ancient sample iterative runs.

Marker-frequency-based statistics. To further confirm the existence of Native American ancestry in Polynesia, we conducted genome-wide admixture f4 and D-statistic tests across 689,899 SNPs. For the target Polynesian individuals, we pooled individuals from those islands with a greater than 1% average Native American component in our ADMIXTURE analysis (Supplementary Table 4), selecting all individuals without later European or African admixture—that is, with a proportion of European and/or African ancestry of no more than 0.005: Mangareva (3), North Marquesas (1) and Rapanui (6) individuals. The f4 statistics were computed using comparison populations from Europe (UK and Spain), Africa (Yoruba), China, and Vietnam from 1000 Genomes18, along with Native American individuals from Peru (Aymara) and Polynesian individuals from Mauke, all of which had shown no admixture in our previous analyses. With the program fourpop60, we computed f4 statistics of the following form: f4(target Polynesians, Mauke; X, Y), where X and Y represent all possible combinations of the other populations (Supplementary Fig. 10). Standard errors were estimated by block-jackknife with a block size of 500. As an additional verification of significance, we ran 100 coalescent simulations via fastsimcoal using the method of ref. 61 to estimate the proportion of simulated jackknife blocks larger than those observed for f4(target Polynesians, Mauke; Aymara, Yoruba) (Supplementary Fig. 10). None were observed. To further test whether the target Polynesian individuals carry Native American ancestry, we computed D-statistics35 of the following form: D(Mauke, target Polynesians; H3, Yoruba). In this case, H3 is a set of reference populations and individuals that include precontact ancient Native American genomes (Supplementary Tables 1, 3, 4). We again estimated standard errors through a block-jackknife procedure (Supplementary Fig. 11).

Compositional analyses of ancestry proportions

Thanks to the large sample size of Rapanui individuals (n = 166), we were able to conduct statistical analyses of the ancestry proportions of this population, and thus to characterize the associations between the different ancestries. We consider first all four of the ancestry proportions identified by our iterative ADMIXTURE analysis in the Rapanui population: central Native American, southern Native American, European and Polynesian. We neglect the African component in the Rapanui individuals, as it is present in only 12 individuals with a proportion above 0.005, and so these dozen individuals are simply excluded.

Because these ancestry components (pi for each ancestry i) are constrained to live on a simplex (that is, Σi pi = 1; termed compositional data), computing raw covariances and correlations between ancestry components is not informative. (As one ancestry proportion rises, the others must fall, leading to intrinsic negative covariances and correlations.) Thus, we rely on statistical methods developed for compositional data49 to characterize associations between ancestry components. In particular, we compute the log-ratio variance for each pair of ancestries i and j, τij = Var[ln(pi/pj)], which together completely characterize the covariance structure of the composition49 (see Supplementary Table 7). Smaller values of τij indicate that one component does not vary much relative to the other, and larger values indicate that the ancestry components do vary freely relative to one another. We also compute the compositional analogue to correlation62, ρij = exp(−τij2/2) (Supplementary Table 8). To visualize these associations, we plot the ancestry composition of each individual inside the four-component simplex, namely a tetrahedron (Supplementary Fig. 12).

We next analyse the subset of the Rapanui individuals without a southern Native American (Chilean) component—that is, the 64 Rapanui individuals with less than 1% southern Native American in the K = 6 iterative ADMIXTURE analysis (Fig. 1b). These individuals lie on the triangular simplex (Fig. 2b) that forms the base of the tetrahedral simplex above. Within this subset of individuals, we also compute τij and ρij for each of their ancestry pairings i, j (Supplementary Tables 9, 10). To confirm the observed association between the central Native American component and the Polynesian component, we also perform a compositional (log-contrast) PCA on these individuals. As the points lie on a two-dimensional compositional simplex within R3, we map them to a two-dimensional linear subspace of R3 (the subspace orthogonal to the vector [1,1,1]) using the centred log-ratio transform (clr)62. This isometry transforms each individual’s vector of ancestry components [pi, pj, pk] by replacing each ancestry proportion with the log of that proportion divided by the geometric mean of all ancestry proportions, that is, clr(pi) = ln(pi/(pi pj pk)1/3).

We then perform a standard singular value decomposition on the centred compositional vectors in this space to determine the principal components. Because we are now in a two-dimensional Euclidean subspace, we find exactly two principal components: the first component (v1) and the vector orthogonal to it (v2) in this subspace, v1 = [Polynesian, European, Native American] = [0.411, 0.816, 0.405], and v2 = [Polynesian, European, Native American] = [−0.705, −0.0037, 0.709], having corresponding singular values of σ1 = 2.26 and σ2 = 0.219 respectively. Thus, less than 1% of the variance in the clr-transformed space occurs along the second principal component σ12/(σ12 + σ22) = 0.009 < 1%. As there is almost no variation along the second principal component, the projection of ancestries along this direction in clr-space is approximately constant, so [−0.705, −0.0037, 0.709] · ln(1/(P E N)1/3 [P, E, N]) is approximately constant, or equivalently P−0.705 E−0.0037 N0.709 is approximately constant, where P, E and N are respectively the Polynesian, European and central Native American ancestry proportions in an individual. Exponentiating on both sides of the latter equation by (1/0.705) = 1.42, we have constant  ≈ N1.006E−0.005/P ≈ N/P. In other words, the central Native American component (N) varies directly with the Polynesian component (P) in these Rapanui individuals, and both vary freely relative to the European component (E). Compositional analyses and plots were made in R using the Compositions 1.4.0 package62.

Local ancestry inference

Reference panels for local ancestry. For our Axiom LAT-1 array analyses (689,899 SNPs), we used a balanced set of reference individuals consisting of: African (60 Yoruban individuals), European (30 Spanish and 30 British individuals), Native American (60 unadmixed Native American Aymara individuals genotyped on the Axiom LAT-1 array) and Polynesian (60 individuals identified by ADMIXTURE (Fig. 1b) as having less than 1% non-Polynesian ancestry) (Supplementary Tables 1, 3). Note that our local ancestry inference method, RFMix, can identify admixture in its reference panels, if such admixture exists, through its expectation maximization iterations18. For our Illumina MEGA array analysis (896,557 SNPs), we used as reference those same European and African reference panels together with 60 unadmixed Native American Aymara individuals genotyped on the MEGA Illumina array (Supplementary Table 3). For our Illumina 610-Quad array (620,901 SNPs) analyses, we used the local ancestry results of ref. 63.

Phasing. Phasing was performed together on all samples using SHAPEITv2.837 with default parameter settings64. Population phasing has been shown to be particularly effective in such highly related, small, founder populations64, as on these remote Polynesian islands.

RFMix. The program RFMix v1.5.4 uses a conditional random field smoother to stitch together the results of random forest classifiers applied to successive windows of SNP markers to recognize local autosomal haplotype variant patterns (linked sequences of SNPs) characteristic of different ancestries18 (Fig. 2a). Methods that ignore SNPs’ relative positions and linkage (for example, f4 and D-statistics, ADMIXTURE and PCA) are blind to such characteristic sequence patterns. This is a semisupervised learning approach that requires reference panels from each ancestry of interest, as described in detail above. We ran RFMix with the recommended two expectation maximization iterations and a 2-millimorgan window size to identify genomic regions of Polynesian, European, African and Native American ancestry in our Pacific island samples and to identify genomic regions of European, African and Native American ancestry in our populations from the Pacific coast of the Americas. We chose these reference ancestries on the basis of our unsupervised admixture analyses (Supplementary Figs. 1, 5), which had indicated the presence of these continental ancestries in our samples.

Ancestry-specific analyses

For the ancestry-specific analyses below, all ancestries except the ancestry of interest are ‘masked’ within each sample by thresholding the posterior probabilities returned by RFMix at a 0.99 probability level for the ancestry of interest. In other words, all haploid markers along each individual’s genome that are inferred to come from a different ancestry than the one of interest are treated as missing. In addition, on Rapa Nui a high Native American ancestry population group is defined to be those Rapanui individuals with greater than 40% of their genome in inferred Native American ancestry segments according to RFMix. This group also has much higher European ancestry than average for the island of Rapa Nui, as southern Native American ancestry and European ancestry are associated on the island (see above and Supplementary Tables 7, 8), and so we also refer to this group as ‘high European Rapa Nui’ in later analyses.

Ancestry-specific PCA. The two masked haploid genomes (haplotypes) for each individual are combined to generate a genotype frequency vector, with 0 representing no alternate allele seen at a marker, 0.5 representing one alternate allele and one reference allele seen, and 1 representing no reference allele seen. For markers at which both haploid genomes had missing data in a given individual, a missing value is recorded for that marker site. These genotype frequency vectors for each individual are assembled to create a masked ancestry-specific genotype frequency matrix X, with n samples (rows) and p SNPs (columns). This masked matrix is then completed using the singular value decomposition (SVD), with cross-validation used to determine the optimal reconstruction dimensionality (Supplementary Figs. 15, 17), to produce a Y matrix65. Some individuals (rows) have large numbers of masked markers, so to reduce noise in the PCA we perform a weighted, rather than typical unweighted, PCA. The weights allow the principal components to be defined more heavily by the less masked, more precisely known, samples. As the variance of an estimated sample increases with the number of missing (masked) sites in that sample, we compensate by weighting each row (sample) proportionally to the fraction of non-missing sites present in that row (sample). Thus, we compute the SVD of W1/2Yc, where Yc = Y – 1/n × 1n (1n)T × W × Y, with Yc being the weight-centred, completed sample matrix, W the diagonal matrix of weights, and 1n the n-element vector of ones. (Computing the SVD of W1/2Yc is equivalent to diagonalizing the YcTWYc matrix.) The diagonal elements of W are given by {wi = fi  /   Σj   fj }i = 1...n, with fi being the fraction of SNP sites present (not masked) in row i.

We apply this algorithm to the samples genotyped on the Affymetrix Axiom LAT-1 array (689,899 SNPs) and to the merge of samples genotyped on this array together with additional American Pacific coast reference individuals genotyped on the Illumina MEGA array (two-array intersection of 91,835 SNPs) (Supplementary Figs. 14, 16, respectively). In the first PCA, only individuals with at least 90,000 SNP markers in Native American tracts (unmasked SNPs) were plotted, as individuals with fewer SNPs suffer from greater noise (scatter) in their projections. For higher resolution (less noise), Native American genomic regions from all individuals on an island are also used to plot island-specific genotype frequency vectors. These genotype frequency vectors are formed by aggregating the Native American ancestry fragments from all individuals on the same island and calculating, for each marker, the ratio of the number of alternate alleles seen at that marker to the total number of unmasked alleles at that marker on that island. In the second PCA, which has far fewer SNPs from the outset, only island-specific genotype frequency vectors were plotted, except for the high Native American Rapa Nui, who each individually have sufficient numbers of Native American SNPs to be plotted separately without excessive noise.

Ancestry-specific MDS. Ancestry-specific MDS makes use of the fact that distances can be computed between pairs of genotypes, even if some markers are missing (masked) in each individual, simply by normalizing by the number of markers present for comparison in each individual. This approach was first pursued by Browning et al.66, who noticed that the resulting distance matrix may still contain missing elements—namely, when two samples have no non-missing ancestry segments in common. In this case, Browning et al.66 suggest completing each missing distance matrix entry by using the average distance of that individual against all others (mean imputation). However, one can construct a better estimate by noting that distance matrices have a high degree of structure—in particular, their elements must obey the triangle inequality. This allows missing values to be estimated by finding all possible triangles formed by the two samples that have no overlap and a third sample with which both do overlap. The common missing leg is then taken to be the minimum, over all these triangles, of the sum of their two known legs67. This triangulation allows the missing distance to be estimated from the known distances, rather than simply replacing it with a population-wide mean, giving much more accurate estimates for individuals with large amounts of masked ancestry (as found in Pacific islanders in Native American ancestry-specific analyses). As an additional advantage, none of the inferred distances will violate the triangle inequality; this is not true for the method of Browning et al.66.

We implement this triangle-based algorithm to create an ancestry-specific approach to MDS that is accurate even for highly admixed samples. We use the average number of pairwise differences as a distance metric, as it is proportional to genetic drift68.

We apply our ancestry-specific MDS method to the Native American ancestry-specific genotypes of Polynesian and American samples genotyped on Affymetrix Axiom LAT-1, Illumina MEGA and Illumina 610-Quad (Supplementary Fig. 14) and also to the European ancestry-specific genotypes of Polynesian indivdiuals genotyped on Affymetrix Axiom LAT-1 together with European samples from POPRES genotyped on the Affymetrix GeneChip 500K and European full genomes from the 1000 Genomes Project (Supplementary Table 3 and Fig. 3a).

Procrustes. In order to confirm the findings of our new high-resolution ancestry-specific MDS and PCA methods described above, we carried out a traditional Procrustes analysis69 to combine two separate Native American ancestry-specific PCAs (ASPCAs) that were constructed by the older ASPCA method70. The first ASPCA (Supplementary Fig. 19) was constructed using American reference populations genotyped on Illumina MEGA combined with American reference populations and Polynesian populations genotyped on Axiom LAT-1 (a 91,835 SNP two-array intersection). The second ASPCA (Supplementary Fig. 20) was constructed using American reference populations genotyped on Axiom LAT-1, Illumina MEGA and Illumina 610-Quad (a 28,653 SNP three-array intersection). The local ancestry inference used for the masking of non-Native American ancestries for these ASPCAs was performed using the full-density SNP set of each of the three arrays, that is, before intersection (see ‘Local ancestry inference’ section above). The initial coordinates of the Pacific island individuals’ Native American ancestry were determined using the first ASPCA with the higher-density (91,835 SNP) two-array intersection. These positions were then mapped onto the lower-density (28,653 SNPs) three-array intersection of the second ASPCA, containing the full panel of American reference populations, using a Procrustes transform. The linear Procrustes mapping was identified by comparing the positions of the American reference individuals shared between the first ASPCA and the second ASPCA. Because these reference individuals have high Native American ancestry, they have few masked sites and suffer less from reduced SNPs in array intersections than the Pacific island samples (see Supplementary Fig. 20), resulting in less noisy positions.

Identity-by-descent segment analysis

Germline. IBD segments were identified using GERMLINE 1.5.3 on the SHAPEIT phased haploid genomes using the haploid flag, allowing a maximum of four homozygous marker mismatches per IBD slice (-err_hom), a maximum of one heterozygous marker mismatch per IBD slice (-err_het), and a minimum length for IBD detection of 3 cM (-min_m)71.

Ancestry-specific filtering. IBD segments were then filtered on the basis of their overlap with local ancestry segments. For example, IBD segments located entirely within European ancestry segments, as previously determined by the local ancestry methods described above, are binned separately from those in Polynesian and Native American ancestries.

Ancestry-specific IBD networks. Previous publications on IBD networks have inferred the edge connections on the basis of the average total sum of IBDs shared between individuals within each pair of populations72,73, or on the total sum shared within a specific IBD segment length range74. Here we consider instead the number of individuals from two populations who are connected by IBD segments above a threshold length. Specifically, we consider the probability that an individual selected at random from population A shares substantial IBD (greater than 7 cM, to ensure no spurious matches) with an individual selected at random from population B. This can be easily computed by dividing the total number of such interisland individual pairs connected by more than 7 cM of IBD by the total number of possible interisland individual pairs. We construct two networks with edges reflecting these probabilities, one for IBD segments located entirely in European ancestry segments of the genome and another for Native American ancestry segments (Supplementary Fig. 13 and Supplementary Tables 11, 12). We do not plot Polynesian segment IBD probabilities, as we found all islands to share Polynesian ancestors with a probability of near one. Networks were plotted in R using the package qgraph75.

Dating analyses

Tract-length distribution analysis. The timing of admixture events between different ancestral populations can be inferred by analysing the length distributions of genomic segments inherited from each ancestry, aggregated over all individuals in the studied population32. Here (Supplementary Fig. 23) we conduct our analysis separately on each island that possesses at least 1% average Native American component in both our ADMIXTURE (Supplementary Table 5) and RFMix (Supplementary Table 6) analyses. We considered Polynesian, Native American and European ancestries with genomics segments assigned by local ancestry inference using RFMix as described above. The small number of individuals with African ancestry were excluded, as above, because such ancestry is rare, likely to be post-colonial, and in any case not the focus of our present dating analysis. We excluded 12 individuals with above 0.005% African ancestry from Rapa Nui, 3 from South Marquesas and 6 from North Marquesas.

We used the Tracts method32 to fit three models with different sequences of historical admixture for each island: first, Polynesian–Native American admixture followed by later European admixture; second, European–Native American admixture followed by later Polynesian admixture; and third, Polynesian–European admixture followed by later Native American admixture. To optimize model parameters over the nonlinear likelihood surfaces, we ran Python’s COBYLA optimizer one hundred times each with different random starts for every population and model. The best-likelihood runs were chosen (Supplementary Table 13) and, although some random starts failed to converge, of those that did, most converged to similar maximum likelihoods and similar model parameters. The admixture model with the highest likelihood (Supplementary Table 13) was then selected. This method gives estimates of the time since each admixture event, measured in number of generations. To convert these to admixture dates, we used a generation time of 30 years (see Supplementary Discussion) and the sample collection dates in Supplementary Table 2. Tracts was also run separately, as described above, on the 64 Rapanui individuals identified as having less than 1% of a southern Native American component in our iterative ADMIXTURE analysis (see Supplementary Fig. 24).

Linkage disequilibrium decay analysis. We also performed a complementary analysis of linkage disequilibrium decay using ALDER 1.0, which requires neither phasing nor local ancestry inference34. Observing that the Native American ancestry-specific IBD clustering network indicated common Native American ancestry in eastern Polynesia, we pooled individuals across islands for this analysis. In particular, from the islands that had a greater than 1% average Native American component in our ADMIXTURE analysis (Supplementary Table 5), we pooled all individuals without later European or African admixture, that is, with no more than 1% European and/or African ancestry: Mangareva (4), Palliser (2), North Marquesas (1) and Rapa Nui (6). As ancestral reference proxies, we used 30 unadmixed Native American Aymara individuals and the 22 Austronesian individuals from the Atayal and Paiwan of Taiwan (Supplementary Tables 1, 3). We used a total of 690,692 SNPs in this analysis (see Supplementary Fig. 23g).

Because we had a large number of samples from Rapa Nui, we hoped to have increased resolution in our dating analysis there. However, all but six of the Rapanui individuals have European admixture, so we could not use the two-population model of ALDER on the full set of Rapanui individuals, and instead used MALDER 1.0 (ref. 76) (Supplementary Table 14). For the Rapanui individuals, we could not pool those sampled in 1994 with those sampled in 2013 for this dating analysis, because almost one generation separates these two collections. We focused on the 1994 Rapa Nui samples, as this collection had lower amounts of the modern southern Native American and European ancestries (Supplementary Table 5). In addition, to reduce the complexity of the admixture model, we excluded the 13 Rapanui individuals from the 1994 samples having African ancestry (greater than 1% in our ADMIXTURE analysis; Fig. 1b), leaving 73 individuals. For our ancestral reference proxies, we used 30 unadmixed Native American Aymara individuals, 30 European individuals from Spain, and the 22 Austronesian individuals (Atayal and Paiwan) from Taiwan (Supplementary Tables 1, 3).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.