Introduction

Aboriginal Australians represent one of the oldest continuous cultures in the world, as their ancestors were among the first groups to leave Africa and travel through the Middle East Asia, and then Southeast Asia (i.e., the ancient landmass called Sunda).1, 2, 3 A recent review of the archeological record of the ancient landmass of Sahul (which comprised present-day New Guinea and Australia) concluded that anatomically modern humans colonized Sahul at least 47 thousand years ago (KYA).4 The timing of this migration, the route(s) taken and the entry point(s) into Sahul have been subject to ongoing debate by archeologists, anthropologists and geneticists for many years, with evidence both for and against a single colonization event.4, 5, 6, 7, 8, 9, 10, 11, 12, 13 Unfortunately, the adverse impacts of European colonization on Aboriginal society since the eighteenth century seriously confound attempts at reconstructing the past genetic structure of Aboriginal society.14

Analyses of the uniparentally inherited Y chromosome and mitochondrial DNA (mtDNA) and the biparentally inherited autosomes of Aboriginal Australians has been crucial in identifying the possible migration routes and the timing and number(s) of potential colonization events their ancestors undertook to reach the continent. Whole-genome studies using the DNAs of both a 100-year-old Aboriginal hair sample15 and a number of present-day Aboriginal Australians13, 16, 17 support the great antiquity of the settlement of Australia by the ancestors of the Aboriginal Australians, as well as their deep connection with indigenous populations of New Guinea. However, one genome-wide single-nucleotide polymorphism (SNP) study claimed that, in addition to the initial settlement, substantial gene flow from India into Australia occurred during the Holocene.8

Y-chromosome and mtDNA analyses also support an ancient settlement of Australia. However, inferences have generally been limited by the low sample numbers of 10 to 100 individuals and/or the few geographic regions within Australia that have been surveyed.6, 7, 13, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38 Despite this, a recent comprehensive study of Y-chromosome variation in Aboriginal males suggested a settlement of Australia at least 45 KYA,39 with this date being subsequently supported by the deep sequencing of Aboriginal Y chromosomes.40

Most previous mtDNA data, by far, have been obtained from groups in the Northern Territory, northern Western Australia and western New South Wales.6, 21, 22, 23, 33, 41 There are limited data from South Australia and Queensland (except for the samples reported by Malaspinas et al.13 and Ballantyne et al.41 which the latter we reanalyze here), and none are available from Victoria, although these States contain large populations of Aboriginal people. For example, one-fifth of all Aboriginal people reside in Queensland.42 Also, as not all of the studies typed the same mtDNA SNPs or sequenced hypervariable segments I and II (HVS-I and HVS-II), it is difficult to compare some of these data sets, although van Holst Pellekaan14 attempted to do so.

mtDNA haplogroups acknowledged as uniquely Aboriginal Australian fall within macrohaplogroups M and N (including R). Haplogroups presently considered autochthonous to Australia are N13, O, M42a, M14, M15, S and numerous P subtypes (P3a, P4b, P5, P6, P7, P8). Current time to most recent common ancestor (TMRCA) estimates indicate great antiquity for most of them.6, 14, 21, 22, 33, 43, 44

The absence of mtDNA data for most of Australia limits inferences about the distribution of these haplogroups throughout the continent, Aboriginal Australians’ relationships with neighboring populations and the maternal genetic history of Australia as a whole. A common misconception about Aboriginal Australians is that they are culturally and linguistically a homogenous population, as discussed in Hiscock.45 Before European settlement in the eighteenth century, there was a well-defined Aboriginal language structure, as described by Horton.46 Consequently, a genetic survey of one region would not necessarily encapsulate their overall diversity. van Holst Pellekaan et al.21 for example, described significant differences in the HVS-I haplotypes between the Aboriginal people of the Darling River in New South Wales and the Yuendumu of Central Australia (the Northern Territory).

Given the lack of whole mtDNA genome sequences of Aboriginal Australians (there are 38 in GenBank as of June 2016), and, therefore, the paucity of detailed phylogenetic knowledge regarding Aboriginal haplogroups, assignment to haplogroups based on limited SNP information is problematic for many samples. As only one representative mitogenome sequence is available for certain haplogroups; for example, M15, S5 and P5, HVS-I motifs are inadequate for Australian-specific haplogroup assignment. Also, given the antiquity of these haplogroups, the HVS, due to its elevated average evolutionary rate compared with the mtDNA coding region, has most likely accumulated recurrent mutations that may obscure phylogenetic signatures. Furthermore, given the antiquity of Australian-specific haplogroups, their HVS haplotypes are characteristically more divergent than those of younger haplogroups. These factors render haplogroup assignment made from HVS data alone challenging or impossible. However, as emphasized by van Holst Pellekaan,14 studies that have limited coverage of mtDNA sequences nonetheless provide important information about family genealogies as well act as a useful sorting method for subsequent whole mtDNA genome sequencing.6, 22, 47

To expand knowledge of mtDNA diversity among this poorly studied population, we assembled a large sample of self-reported Aboriginal Australian individuals for the analysis of both mtDNA SNPs and HVS variation. This sample set was drawn from widespread locations across Australia, especially from either areas not previously sampled, such as Victoria, or those previous poorly sampled, such as Queensland and Tasmania. Additionally, the sample included individuals whose ancestry lay in one of the Torres Strait Islands, which lie between Cape York in Far North Queensland and Papua New Guinea, and represent the remnants of the former land bridge between Australia and New Guinea.

This study aims to investigate the distribution of indigenous mitochondrial haplogroups and associated HVS variation across the continent, to infer ages of these lineages relevant for studying human history and to explore the maternal genetic relationship of the Aboriginal Australian people with neighboring populations.

Materials and methods

Sample collection

A total of 502 buccal swab, mouthwash or Oragene samples were collected from self-declared Aboriginal Australians residing in the States of Queensland, New South Wales, Victoria, Western Australia and Tasmania, as well as the Northern Territory. It is important to note that Aboriginal Australian is a culturally based affiliation and not one defined by a person’s genetic composition. The current residence of participants may not represent the homeland or even birthplace of the individual, but, in the absence of genealogical information from the majority of the participants, assignment of samples by the State of residence was the best possible treatment of the data. This study was conducted with the approval of the La Trobe University Human Ethics Committee.

In addition, the 92 Aboriginal DNA samples from the Northern Territory (n=76), Queensland (n=10) and South Australia (n=6) previously reported in Ballantyne et al.41 were available for HVS sequencing and included in these analyses, giving an overall total of 594 samples.

DNA extraction

DNA was extracted from buccal swabs and mouthwash samples following the method of de Vries et al.48 and from the Oragene saliva collection tubes following the manufacturer's protocol.

MtDNA SNP typing and HVS sequencing

Selected coding-region mtDNA SNPs were genotyped using custom TaqMan assays (Applied Biosystems, Foster City, CA, USA). The SNP hierarchical genotyping protocol was as follows (and depicted in Figure 1). All samples were typed for basal haplogroup L (nucleotide positions 3594, 13 650 and 7256), M (nucleotide position 14 783), N (nucleotide positions 10 873, 8701 and 9540) and R (nucleotide position 12 705 ) macrohaplogroup markers. Individuals who belonged to macrohaplogroup N were subsequently typed for the 8404 SNP, which denotes haplogroup S, and SNP 9140, which denotes haplogroup O. Those individuals who belonged to haplogroup R were subsequently typed for nucleotide position 15 607, which characterizes both haplogroups P and T. Those individuals who belonged to macrohaplogroup M were typed for M42a (nucleotide position 12 771) and M29/Q (nucleotide position 13 500).

Figure 1
figure 1

mtDNA classification tree used for haplogroup assignment in the present study.

mtDNA haplogroups considered to be indigenous to Australia that are supported by observations are M14, M15, M42a, N13, O and subtypes, S and subtypes and specific P subtypes: P3a, P4b, P5, P6, P7 and P8.6, 14, 21, 22, 33, 43 It is unclear whether haplogroup Q is indigenous to Australia as discussed in Hudjashov et al.6 All other mtDNA haplogroups typed are considered non-indigenous as they could be explained by historical Eurasian admixture. The 92 samples of Ballantyne et al.41 had previously been SNP genotyped using a specifically designed SNaPshot assay that comprised the major mtDNA haplogroups of Oceania.

In all samples, the first HVS-I from nucleotide position 16 000 to 16 400 and HVS-II from nucleotide position 40 to 310 were sequenced at the Australian Genome Research Facility, Melbourne, Australia, with the ABI Big Dye Terminator v.3.1 Cycle Sequencing Kit (Applied Biosystems) and analyzed on an ABI 3730 DNA Analyzer (Applied Biosystems). Primers are described in Supplementary Table 1. The 25 μl PCR reactions included 1 μl DNA, 1 μl 10 μM forward primer, 1 μl 10 μM reverse primer, 1 μl 5 U μl−1 Taq polymerase, 5 μl PCR master mix and 16 μl H2O. The PCR protocol was: 37 cycles of denaturation at 95 °C for 30 s, primer annealing at 63 °C for 35 s and extension at 72 °C for 1 min and 30 s. Mutations in the HVS region were defined by aligning and comparing the sequences to the revised Cambridge Reference Sequence49, 50 using MEGA v.6.51

Haplogroup assignment

All HVS haplotypes and associated mtDNA SNPs were also analyzed with the haplogroup prediction program, HaploGrep,52 based on Build 16 of the reference human mtDNA phylogeny http://phylotree.org,53 and confirmed via manual interpretation using the same PhyloTree Build. Haplogroups were initially assigned using HaploGrep, although, upon inspection, some dubious haplogroup assignments emerged. Each sample was rechecked manually to improve and verify the predictions made by HaploGrep (e.g., haplogroup D for some of those samples within macrohaplogroup M). In cases where mtDNA sequences did not possess the necessary diagnostic SNPs and HVS motifs for adequate haplogroup assignment, samples were denoted by their basal haplogroup status; that is, S* and P*.

Data analysis

Median-joining networks were constructed from HVS haplotype and mtDNA SNP data using Network version 4.6.1.2 (http://www.fluxus-engineering.com/sharenet.htm; Bandelt et al.54). HVS sequences were combined with published Aboriginal Australian HVS data.6, 21, 33, 38

Population pairwise genetic distances (FST values) were calculated from HVS-I haplotypes using the Arlequin v.3.5 package55 and their significance was assessed by 1000 bootstrap simulations. Haplotype diversity and sampling variance were calculated using the formula of Nei.56 Multidimensional scaling (MDS) analysis of indigenous mtDNA haplogroup frequencies of the States of Australia was performed using SPSS v.21.0 software (SPSS, Inc., Chicago, IL, USA).57 An MDS analysis of haplogroup frequencies of the entire Aboriginal Australian sample compared with those from neighboring Island Southeast Asian (ISEA) and Oceanian populations,47, 58, 59 African populations60 and European populations61 was also performed.

Estimation of the TMRCA of haplogroups was performed using the ρ-statistic, as calculated in the software Network.62 The ρ-statistic calculates TMRCA estimates based on an average mutation rate and an inferred ancestral haplotype, and do not incorporate a specific demographic model. However, the resultant TMRCAs can be underestimates since the method sometimes fails to distinguish between haplotypes that are identical by State rather than by descent. The mutation rate used was that of Soares et al.63 where 1 transition occurs every 16 677 years for the region 16 051 to 16 400 (excluding nucleotide positions 16 182, 16 183 and 16 194).

Results

Non-indigenous mtDNA haplogroups and admixture in Aboriginal Australians

Sequencing the HVS and genotyping coding region SNPs in 594 Aboriginal Australian mtDNAs revealed that 133 (22.4%) individuals could be assigned to non-indigenous haplogroups. Of these, the vast majority (98%) was of European origin (Table 1a and Supplementary Table 2), which is explained by historical admixture. The frequency of non-indigenous haplogroups varied across Australia, with New South Wales having the highest (of States with >10 sampled individuals) and the Northern Territory the lowest (Table 1a). An MDS plot based on FST distances of mtDNA haplogroup frequencies in ISEA, Oceanian, European and African populations confirmed the close affinities of Australian individuals with these non-indigenous haplogroups to Europeans (Supplementary Figure 1). The most frequent non-indigenous haplogroup was H (9.3% of the total sample).

Table 1a Distribution of mtDNA haplogroups in Aboriginal Australian populations after SNP typing

Indigenous mtDNA haplogroups

The remaining 461 samples (77.6%) were classified as belonging to indigenous Aboriginal Australian mtDNA haplogroups (Table 1b), and are the subject of the subsequent analyses.

Table 1b Distribution of indigenous mtDNA haplogroups in Aboriginal Australian populations after SNP typing

Macrohaplogroup M

Macrohaplogroup M lineages accounted for 122 (26.4%) of the 461 samples classified as indigenous. Forty-three individuals (9.3% of indigenous sequences) were assigned to haplogroup M42a and 12 individuals (2.6%) to haplogroup Q. Haplogroup M42a was present in all eastern mainland States and the Northern Territory (Figure 2). The 12 haplogroup Q individuals from New South Wales and Queensland were subclassified as either Q1 or Q3. HVS sequence data for 67 (54.9%) M individuals, however, did not allow subhaplogroup assignment with confidence. These individuals were labeled M*(xM42a,Q), but considered indigenous given that the HVS haplotypes could not be assigned to Melanesian M subhaplogroups M27, M28 or M29.6, 64, 65 All States, except South Australia and Tasmania, had individuals belonging to this unresolved paragroup.

Figure 2
figure 2

Geographic distribution and frequencies of indigenous Australian mtDNA haplogroups. NSW, New South Wales; NT, Northern Territory; QLD, Queensland; SA, South Australia; TAS, Tasmania; VIC, Victoria; WA, Western Australia. The size of the pie charts are proportional to the sample size. A full color version of this figure is available at the Journal of Human Genetics journal online.

In the network, HVS-I haplotypes belonging to haplogroups M42a and Q formed well-defined clusters (Figure 3). Distinct clades could be identified within M42a, suggesting the existence of distinct subhaplogroups within it. Q haplotypes also clustered and were separated by a long branch from the rest of the haplotypes in this network. The M*(xM42a,Q) haplotypes were widely dispersed, despite some limited clustering being evident. Ten individuals from Queensland shared HVS-I (and HVS-II haplotypes) with two of the M42*(xM42a,Q) individuals from the Northern Territory reported by Ballantyne et al.41 Also, two M*(xM42a,Q) individuals, one from the Northern Territory and the other from Western Australia, clustered with the HVS-I and HVS-II haplotype of the published haplogroup M15 individual.6 Overall, geographical clustering of haplotypes within M (including Q) at the State/Territory level was substantial (Figure 3).

Figure 3
figure 3

Median-joining networks based on 68 haplogroup M and Q HVS-I haplotypes from the present study and published M14, M15 and M42a haplotypes. Nodes represent haplotypes, with areas proportional to their frequency. Lines represent mutational steps between haplotypes. Nodes are colored according to (a) (sub)haplogroup affiliation and (b) geographic origin. A full color version of this figure is available at the Journal of Human Genetics journal online.

Macrohaplogroup N

Haplogroups N*, N13, O, S and P within macrohaplogroup N together comprised the majority (339 or 73.5%) of the 461 individuals with indigenous haplogroups. Eleven individuals (3%; from the Northern Territory and Queensland) could not be assigned to a recognized N subtype, and were denoted N*(xN13,O,S,R). Haplogroup N13 was detected in three individuals from Queensland; one shared its haplotype with a previously reported individual from Western Australia,6 while the other two shared a novel haplotype. Haplogroup O had a wide distribution, being present in New South Wales, the Northern Territory, Queensland and South Australia (Figure 2), with some geographic structure evident (Figure 4). The majority of haplogroup O haplotypes (88.8%) could not be assigned to the O1 subtype (so far, the only defined subhaplogroup of O), suggesting that a novel subtype(s) may exist within this clade.

Figure 4
figure 4

Median-joining networks based on 24 haplogroup N and O HVS-I haplotypes from the present study and published haplotypes. Nodes represent haplotypes, with areas proportional to their frequency. Lines represent mutational steps between haplotypes. Nodes are colored according to (a) (sub)haplogroup affiliation and (b) geographic origin. A full color version of this figure is available at the Journal of Human Genetics journal online.

Haplogroup S comprised 104 individuals (22.6%) of indigenous samples. The most frequent subtype was S1 (52.8% of S individuals). However, a large proportion (33.6%) could not be assigned to any of the currently known S subtypes (S1–S5) based on HVS data, suggesting that either these known subtypes have diverged further than currently recognized and/or novel S subtype(s) exist. The two distinct S1 clusters in the haplotype network (Figure 5) suggest the existence of at least two subhaplogroups within S1 (so far, one subclade, S1a, is defined in PhyloTree). Haplogroup S5 was absent from the present study, but S3 and S4 were found at low frequencies. There was some clustering of the haplotypes within S sublineages, in particular for S1a. Geographically, haplogroup S is a very widely dispersed haplogroup, being observed in all States and the Northern Territory and with minimal sharing of haplotypes across the States (Figure 5).

Figure 5
figure 5

Median-joining networks based on 77 haplogroup S HVS-I and HVS-II haplotypes from the present study and published haplotypes. Nodes represent haplotypes, with areas proportional to their frequency. Lines represent mutational steps between haplotypes. Networks are colored according to (a) (sub)haplogroup affiliation and (b) geographic origin. A full color version of this figure is available at the Journal of Human Genetics journal online.

The most common indigenous haplogroup was P, which comprised 40.9% of the individuals with indigenous haplogroups after the removal of individuals with haplogroups P1 and P3b of assumed New Guinean origin. Haplogroup P was found in all States except Western Australia (its absence there can be attributed to the small sample size for that State since Hudjashov et al.6 reported its presence there). The contribution of New Guinean sublineages P1 and P3b was minor (6.3% of haplogroup P individuals). Notably, all those individuals carrying P1 or P3b mtDNAs were individuals with maternal ancestry tracing back to the Torres Strait Islands, which lie between New Guinea and the Australian mainland. The most common P subtype was P3a (28.3% of haplogroup P individuals). Most importantly, a significant proportion of Australian haplogroup P individuals (88 or 43%) could not be assigned to a known P subhaplogroup based on HVS data (Figure 6), and were accordingly denoted as P*. There was a lack of clustering of haplotypes within particular P sublineages (excluding haplogroups P1, P3a and P3b) due to the high level of diversity of the haplotypes within the subtypes. Overall, many P haplotypes were shared among individuals of the same region, with >80% being geographically localized across Australia.

Figure 6
figure 6

Median-joining networks based on 109 haplogroup P HVS-I and HVS-II haplotypes from the present study and published haplotypes. Nodes represent haplotypes, with areas proportional to their frequency. Lines represent mutational steps between haplotypes. Networks are colored according to (a) (sub)haplogroup affiliation and (b) geographic origin. A full color version of this figure is available at the Journal of Human Genetics journal online.

Diversity indices

When comparing the regions within Australia (i.e., those with sample sizes >10), the Northern Territory had the highest level of mtDNA diversity and Victoria the lowest. Among the 461 HVS-I sequences belonging to indigenous haplogroups, there were 235 haplotypes (50.9% discrimination), and of these, 119 were singletons (50.6%) and another 57 (24.3%) were shared only within a State or the Northern Territory. Combined, these 176 (74.9%) haplotypes were restricted to a particular State. The remaining 59 haplotypes were observed in multiple States and/or the Northern Territory, with the majority shared between Queensland and the Northern Territory. The most diverse (in terms of HVS variation) haplogroups were S and P, which had values >0.940, respectively (Supplementary Table 3a).

Analysis of molecular variance based on haplogroup frequencies demonstrated that variation among the States was small but statistically significant (FST=0.013, P<0.05) (Tasmania and Western Australia were excluded because of low sample sizes). Analysis of molecular variance based on HVS sequence data gave a similarly small value that was statistically highly significant (FST=0.0113, P<0.0001). An MDS analysis of the FST values based on haplogroup frequencies between States indicated that on dimension 1, the largest distance was between the Victoria and Northern Territory populations, with those of Queensland and New South Wales approximately in the middle, while in dimension 2 the Queensland population was separated from the rest (Supplementary Figure 2).

TMRCA estimates

In all instances, the ages of indigenous Australian haplogroups were very old, but depending on the mutation rate used there were marked differences among some of them (Table 2). For example, the TMRCA estimates of haplogroups S and O using the rate of Soares et al.63 the median ages were 49 KY (range 39–60 KY) for S and 39 KY (range 26–53 KY) for O. The Australian-specific P lineages, including the P* paragroup, were estimated to be greater than 55 KY (range 48–69 KY) in age. The TMRCAs of the haplogroups were compared with published estimates, where these were available (Table 2).

Table 2 TMRCA estimates for indigenous haplogroups analyzed in this and other published studies

Discussion

This study represents the largest mtDNA database of Aboriginal Australians yet assembled with respect to both sample size and geographical coverage. It includes samples from the hitherto very poorly represented or unrepresented States of Queensland (including the Torres Strait Islands), Victoria and Tasmania, and with additional samples from the Northern Territory.

The level of introgression of European mtDNA lineages in this Aboriginal sample is considerably smaller (~20%) than the level reported in Y-chromosome studies of ~80%.37, 39 There may have been a loss of Aboriginal Australian mtDNA diversity as a result of European settlement since the late eighteenth century, although the full extent remains unknown. The introduction of exotic disease, the forced removal of Aboriginal Australians from their ancestral homelands into government or religious settlements, restrictions on marriage and the progressive exodus from rural areas into cities for employment have all influenced the findings of this study.

As a result of these processes, much of the language/clan structure illustrated by Horton46 has dissolved, especially in the eastern States of Australia. For example, Aboriginal Australians of Queensland were removed from their native homelands and placed into large government settlements. Field research by Catherine Tennant-Kelly66 recorded that, in a population of 900 Aboriginal Australians, 28 different tribal groups were represented in the government Cherbourg settlement (located in southeast Queensland), a finding that illustrates the disturbance of traditional population structure. Any attempts at reconstructing the past through interpretations of genetic patterns in Aboriginal diversity today must be cognizant of the treatment of Aboriginal people by European settlers.14

Of some concern was the possible inclusion of close maternal relatives in the sample. Although there was some sharing of haplotypes within haplogroups, the impact of relatives in our sample was probably minimal. In cases where haplotype sharing within haplogroups was observed, it was not possible to clarify the degree of relatedness of the individuals, which could be close or distant given the relatively low mtDNA mutation rate. The widespread sharing of mtDNA haplogroups across regions can be most readily be explained by the traditional practice of patrilocality, in which the bride move to the place of her husband and his clan upon marriage.67 Although individuals were characterized broadly by State or Territory, rather than by language or social groups as defined by Horton,46 there was evidence for some geographic clustering of haplotypes within the States, with some signals of female movement between them.

The Aboriginal population showed a high level of HVS-I sequence diversity, which is most probably a function of the merging of groups from all over the continent. However, only one diversity measure was used in this study and different measures will give different values. Aboriginal Australians exhibited one of the highest levels of HVS-I diversity in ISEA and Oceania, with only Bali and Banjarmasin (Borneo) having diversity values >0.987 (Supplementary Table 3b).

TMRCA estimates

The TMRCA of haplogroups covered by the present study suggest that the mtDNA haplogroups in Aboriginal Australians are older than previous calculations, but closest to those reported most recently by van Holst Pellekaan,14 which in turn were themselves much older than prior estimates (Table 2). The estimates of Behar et al.44 van Holst Pellekaan14 and Hudjashov et al.6 were based on coding-region sequences only (~15 kb) and not the HVS-I, and used a variety of mutation rates.31, 35, 68 The estimates of van Holst Pellekaan14 and Hudjashov et al.6 were calculated using ρ-statistic, whereas Behar et al.44 implemented PAML 4.4.69

The coalescence of the haplogroups based on the rate of Soares et al.63 fit reasonably well with the currently accepted latest date of the colonization of Sahul based on archeological evidence, 47 KYA.4 However, haplogroups S and M42a are older than 47 KY (Table 2). The age of haplogroup P is consistent with it being one of the oldest mtDNA lineages in the region, evolving within Sunda just before human populations entering Sahul via New Guinea.

The TMRCAs of indigenous haplogroups that are much older than 47 KY can be explained by at least two scenarios. In the first, haplogroups S, O and M42a haplogroups evolved in group(s) living in Sunda shortly before migration to Australia and subsequently became extinct, or are yet to be found in modern populations in ISEA. Alternatively, Australia was settled considerably earlier than 47 KYA and evidence of this settlement may lie submerged under the Timor and Arafura Seas off the northern West Australian coast, because of a subsequent rise in sea level. We consider the second scenario to be much more likely. All Australian-specific mtDNA haplogroups analyzed in the present study are considered ancient and geographically widespread across the continent. Resolving the competing hypotheses on settlement dates requires improvement in the accuracy of TMRCA estimates, and this may be done through the resequencing of more mtDNA genomes and the use of other mutation rates, such as that of Fu et al.70 which is based on an ancient DNA calibration.

The maternal ancestors of the Aboriginal Australians carried M* and N* (including R*) lineages through Sunda into Sahul most probably between 40 and 60 KYA (according to median values calculated here), as supported by both archeological and paleontological evidence of human activity in Sunda before 40 KYA.71, 72 M haplogroups observed in Aboriginal Australians today appear to be only very distantly related to M lineages outside Australia, including those observed in New Guinea6, 64, 73 and India.74, 75 Haplogroup M42a is widely accepted as indigenous to Australia, with M42a distantly related to M42b, which is found in the South Indian tribal groups, but with a divergence time of at least 55 KYA.74, 75

However, the findings of the present study indicated that the status of haplogroup Q as indigenous to Australia, at least to the Australian mainland, remains contentious.6 All of the haplogroup Q individuals in this study identified themselves as possessing Torres Strait Islander maternal ancestry, mainly from the islands of Mer (Murray), Darnley and St Pauls. Torres Strait Islanders have a distinct culture, combining elements from New Guinea, Aboriginal Australia, the Solomon Islands and Melanesia.76 Although haplogroups Q1 and Q3 were present in Torres Strait Islanders, we did not find the West Australian Aboriginal Q2b subtype observed in a single individual by Hudjashov et al.6 The presence of Q1 and Q3 in Australia thus appears to be limited to groups with ancestry in the Torres Strait and possibly communities of Far North Queensland that have regular contact with the former.

Those haplotypes that could not be assigned to a particular M subhaplogroup and were denoted M*(xM42a,Q) are considered indigenous to Australia, as they could not be assigned to M lineages found in any of the neighboring populations (such as M29 found in New Guinea).6, 34, 64 M lineages are also virtually absent from the European populations that have settled Australia during the last three centuries. These M*(xM42a,Q) individuals most likely carry completely novel M subhaplogroups restricted to Australia, or possess variants of known Australian M subhaplogroups, such as M14 or M15. Some, if not all, individuals denoted as M*(xM42a,Q) may belong to the M42*(xM42a) haplogroup identified by Ballantyne et al.41 as supported by the matching of haplotypes of the two respective haplogroups. However, the present study did not screen samples for the nucleotide position 9156 SNP that characterizes haplogroup M42, and, therefore, it is uncertain whether these individuals belong to this lineage. Accordingly, these samples require full mitogenome sequencing to determine their detailed haplogroup affiliation.

Haplogroups within macrohaplogroup N account for the majority of Aboriginal lineages. Haplogroups N13, S and O have so far not been found outside Australia, although some studies focusing on populations in proximity to Australia have neglected to type SNPs diagnostic of these haplogroups because of their assumed restricted distribution.58, 77 Given the high frequency and wide distribution of haplogroup S within Australia, it is thought that this haplogroup evolved on the background of an ancestral haplogroup N* mitogenome within Australia, and has spread throughout the continent, to as far south as Tasmania,38 which was connected to the Australian mainland by a land bridge until ~12 KYA.78 Tasmanian haplogroup S HVS-I and HVS-II haplotypes observed in the present study are identical to that of ‘Fanny Cochrane’.38 Interestingly, this Tasmanian haplogroup S haplotype was not found in mainland Australia, suggesting the potential isolation of Tasmanian Aboriginals from communities of mainland Australia.78

Haplogroup P evolved in Sunda just before its carriers moved into Sahul, with the P lineages (P9 and P10) being found in the Aeta or Agta people of the Philippines.77, 79, 80 It is, therefore, most likely that P was brought into Sahul via present-day New Guinea some 55–60 KYA and, eventually, some of the descendants spread into Australia, which today contains a number of indigenous P subtypes (P5, P6, P7 and P8), as well as some distantly shared with New Guinea (P3 and P4).6, 22, 31, 81 The genotyping protocol used in this study was such that P5, P6, P7 and P8 were not discernable from mtDNA SNP and HVS sequence data. The two Tasmanian haplogroup P HVS-I and HVS-II sequences were identical to that of the ‘Wyerlooberer’ lineage (Pipers River tribe) of Tasmania.38

Haplogroup P3 and P4 subtypes have been found in Australia as well as in New Guinea, with P3a and P4b being restricted to Australia and P3b and P4a restricted to New Guinea, suggesting that these respective subtypes evolved in situ within Australia and New Guinea separately.6, 14 The divergence of New Guinean and Aboriginal Australian P mtDNAs is estimated to have occurred at least 30 KYA,22, 31, 36, 81 despite that the two groups inhabiting the single landmass of Sahul until around 6 to 8 KYA, when they became separated.82, 83 The resequencing of Aboriginal Australian Y chromosomes supported the ancient connection between New Guineans and Aboriginal Australians implied by mtDNA evidence, indicating that they had diverged at least 48 KYA.40 Additionally, whole-genome analysis of present-day Aboriginal Australian and New Guinean populations suggests that the two populations diverged at least 37 KYA.13 The persistence of haplogroups P and Q and their subtypes in the indigenous populations of Sunda and Sahul,58, 79, 81 therefore, suggests a shared but deep ancestry between populations of the two former landmasses.

There are two main hypotheses concerning the routes taken by the founding populations to colonize Sahul. The postulated Northern Route took seafarers from Sunda via passage across Wallacea to Sulawesi from which the colonists crossed a series of smaller islands before arriving at either the Bird’s Head Peninsula (New Guinea) or the exposed Sahul shelf near the islands of Aru, to the west of New Guinea.11 Under the Southern Route hypothesis, seafarers from the southern tip of Sunda moved along the island chain of Nusa Tengarra to the island of Timor. From Timor, the ancestors crossed the ~80 km wide Timor Trough to reach the then exposed northwestern tip of Australia (now submerged under the Timor and Arafura Seas),11 hence bypassing New Guinea.4, 84

Hudjashov et al.6 postulated a single route of migration with entry to New Guinea and subsequent spread into Australia based on both Y-chromosome and mtDNA data. The findings of the present study give support to settlement via both the Northern and Southern routes, on the basis that some mtDNA lineages are restricted to Australia, while others are shared with New Guinea. A similar picture is seen for Y-chromosome variation in Aboriginal Australians, with both Australian-specific Y lineages (C-M347 and S-P308) and lineages shared with New Guinea (K-M526 and M-M186)20, 39, 40 being observed.

Using ethnographic, anthropological, archeological and genetic data, Williams5 modeled the number of initial colonizers of Australia to be at least 1000 to 2000 individuals who arrived ~50 KYA. Under this scenario, autochthonous lineages such as M42a, S and O and potentially M14, M15 and N13 evolved within Australia some time after the arrival of the ancestors carrying M* and N* mitogenomes into Northern Australia. However, the entry point to Australia of the ancestors carrying P*, P3* and P4* is much more likely to have been via present-day New Guinea. Most importantly, under either model of migration, there is no evidence of back migration of females carrying Australian-specific mtDNA lineages into New Guinea after they arose within Australia.14, 23, 25, 31, 84

What adds to the complexity of any postulated scenario is the possible effects of the expansion of Austronesian-speakers throughout ISEA and then via coastal New Guinea into Oceania and the Pacific some 8 to 12 KYA.58 This movement of people was so extensive that it might have replaced many of the ancient mtDNA genomes in those regions, further confounding attempts to retrace migration routes of the ancestors of Aboriginal Australians through these regions. The data from this study suggest a marked isolation between New Guinea and Australia despite their physical contiguity. When comparing the mtDNA haplogroup composition of populations of ISEA,58 East Timor,47 New Guinea59 and Australia, the MDS analysis suggested that Aboriginal Australians are distant from ISEAs and Timorese, with their nearest neighbors being New Guineans (Supplementary Figure 1).

Controversy remains as to the extent that Aboriginal Australians remained isolated until historical times. It has been proposed that, during the mid-Holocene, the appearance and extensive use of microliths, the spread of the Pama-Nyungan language throughout the Australian continent (except for the far north of the Northern Territory) and the arrival of the dingo could be attributed to a South Asian invasion event.85, 86, 87 A recent genetic analysis also suggested that ~11% of Aboriginal Australian genomes could be attributed to an Indian invasion event that dated to ~4 KYA.8 However, this Indian connection has been challenged by both anthropological and archeological evidence, as discussed in Brown.88 Archeological evidence suggests that microlith use by Aboriginal Australians occurred well before the mid-Holocene and was not introduced from outside the region.45 In addition, the origin and spread of the Pama-Nyungan language family ~5 KYA remains uncertain, although its origin is thought to be indigenous as it shares distant ties with languages in New Guinea, rather than those of India.89

Furthermore, there has been increased interest in deciphering the origins of the Australian dog, the dingo.90, 91, 92, 93, 94, 95 Skeletal evidence from sites in New South Wales, Victoria and Western Australia suggested that the dingo was introduced into Australia during the Holocene ~3.5 KYA,96, 97, 98 possibly by the Toalean or other hunter-gathers from South Sulawesi.92 Yet, it is possible that those responsible for the introduction of the dingo left human genetic footprints as well. The findings from mtDNA analyses (including the present study) and Y-chromosome studies7, 39, 40 clearly exclude introgression of these loci from groups living in Sulawesi, southern India or elsewhere during the mid-Holocene into the Aboriginal Australians sampled.

Evidence from mtDNA and Y-chromosomal DNA further indicates that the settlement of Australia is both ancient and complex. In short, it cannot be fully explained by postulating a single ancestral population entering Sahul some ~50 KYA, most likely via present-day New Guinea, and subsequently dispersing into the rest of the landmass.6, 7 Instead, it is more likely that the founding populations that moved into Sunda fissioned, and the resulting sub-populations went in different directions, resulting in a variety of entry points into Sahul. Consequently, these ancient events are reflected today in the observation that different regions of Sahul have unique as well as shared mtDNA lineages.

Conclusion

In summary, this report presents the largest and most comprehensive study of mtDNA lineages of Aboriginal Australians, one that has revealed high levels of mtDNA diversity, with structure found among the States and the Northern Territory. We found no evidence for subsequent introgression events by people from outside Australia (except New Guinea), before the start of European settlement in the eighteenth century, either by Austronesians and/or Southern Indians as postulated previously. Full mitogenome sequencing is required to determine whether many unresolved mtDNA haplogroup M*(xM42a,Q), N*, S* and P* individuals found in our study comprise other (sub) novel haplogroups. This will provide a fuller understanding of the mtDNA history of Aboriginal Australians.

Web Resources

Accession numbers and URLs for data presented herein are as follows: Fluxus-engineering.com, http://www.fluxus-engineering.com/sharenet.htm (for Network 4.6 software package). GenBank, http://www.ncbi.nlm.nih.gov/genbank/ (for sequences (accession numbers KY192710–KY193685)).