Introduction

The sea nomads of Southeast Asia (SEA) are divided linguistically, culturally and geographically into three groups: the Moken and the related Moklen, the Orang Suku Laut (literally, the ‘Tribe of Sea People’) and the Bajau Laut.1 The Moken, also known as the sea gypsies of the Andaman Sea, now number from 2000 to 3000 individuals.2 They inhabit the Mergui Archipelago off the coast of Myanmar and Thailand (Figure 1), where until recently they spent most of the year on their boats (kabang) and subsisted through maritime foraging. During the wet monsoon season (mid-May to October), they built temporary shelters on the coast, and relied on food gathering in the forests and trading for subsistence.3, 4, 5, 6, 7 Ethnographic records of the Moken are relatively abundant due to extensive surveys during the British colonial period of the nineteenth and twentieth centuries, but their origins remain mostly speculative.

Figure 1
figure 1

Map showing the location of the Mergui Archipelago within Southeast Asia and sampling locations.

The Moken language belongs to the Malayo-Polynesian branch of the Austronesian language family.8, 9 Although anatomically modern humans first reached and settled island Southeast Asia (ISEA) during the Pleistocene, between 45 000 and 50 000 years ago, the ancestry of most modern populations in ISEA is linguistically linked to the mid-Holocene range expansion of the Austronesian language family. Today, the Austronesian languages are spoken throughout ISEA and the Pacific: west to east, from Madagascar to Rapa-nui (Easter Island) and, north to south, from Taiwan to New Zealand.10 According to archaeo-linguistic reconstructions, pre-Austronesian languages likely originated in the southern Chinese mainland 5000–6000 years ago among agriculturalists who migrated to Taiwan and there fully developed the Austronesian languages.11 This argument is supported by the diversity of Austronesian languages within Taiwan, where 9 of the 10 branches of the language family are spoken. Aided by rice cultivation and skilled maritime navigation,12 Austronesian speakers are suggested to have spread rapidly from Taiwan to the Philippines, Indonesia, parts of Melanesia and the rest of the Pacific beginning 4000 years ago, biologically assimilating to various degrees with long resident indigenous populations along the way.13

Recent studies of mitochondrial DNA (mtDNA) variation have greatly expanded upon this two-tiered model of ISEA colonization by emphasizing the movement of indigenous populations in Sundaland, the continent exposed by low sea levels during the Pleistocene that encompassed the Malay Peninsula, Sumatra, Java and Borneo. These migrations were likely prompted by rising sea levels that drowned up to half of the region's land area between 15 000 and 7000 years ago.14, 15, 16 The high level of mtDNA diversity in this region today is characterized by a large number of indigenous clades, of which up to 20% were dated to the initial Pleistocene colonization about 50 000 years ago, and another 20% to the mid-Holocene Out-of-Taiwan expansion supported by linguistic and archeological evidence. The remaining majority of mtDNA lineages were dated between 5000 and 15 000 years ago and associated with the expansion and movement of indigenous populations within ISEA and from mainland SEA into ISEA.15 This movement coincides with the period of postglacial rising sea levels that flooded Sundaland, prompting migration to higher altitudes or the adoption of stilt or boat housing and sea-based subsistence strategies for those who remained at lower altitudes. If the homeland of the early Austronesian speakers lies not in Taiwan but in SEA or ISEA, with Taiwan representing an early offshoot of Austronesian voyagers and the linguistic diversity there reflecting not language origins but isolation, rising sea levels also provide an explanation for the dispersal of the Austronesian languages.15, 16, 17, 18

Moken ethnohistory asserts their ancestral ties to mainland SEA. According to these accounts, Moken ancestors lived in settlements of the Myanmar–Malaya mainland and practiced agriculture, but were driven to the coast by the Burmese to the north and the Malays to the south, and subsequently settled in the Mergui Archipelago.3 Continuous raidings by pirates further forced the Moken to adopt a sea-based lifestyle to avoid capture.5 Alternatively, Ivanoff6 has suggested that the Moken originated in China 4000 years ago and eventually split off from other migrating groups as late as the early seventeenth century. In contrast, subsistence pattern similarities among the Moken and other sea nomads of the region led Sopher4 to propose an ancient common origin of these groups in ISEA, which implies a northward maritime migration of the Moken to their present location in the Mergui Archipelago.5 Initial studies of Moken underwater visual acuity suggested long-term adaptations consistent with an extensive history of maritime subsistence,19 but more recent work indicates that these skills reflect short-term acclimation20 and so could also coincide with a recent adoption of a sea-based lifestyle. Thus, the origins of the Moken remain unresolved.

To investigate the maternal origin of the Thai Moken, we analyzed mtDNA variation from both the hypervariable segment I (HVSI) and whole genome sequences. Available mitochondrial sequences of neighboring mainland SEA and ISEA populations from recent studies were included for comparative purposes.14, 15, 16, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31

Materials and methods

Subjects

Twelve Moken adults contacted in early 1994 on the Surin Islands were included in this study. These individuals were born on four different islands of the Mergui Archipelago: Dung (n=6), Lampi (n=1), Jadiak (n=4) and Pulao (n=1) (Figure 1). At the time of collection, there were 201 Moken individuals from 45 households on the Surin Islands, coming from various upper islands in the Mergui Archipelago.7 The sample included two paternally related individuals, but participants are believed to be maternally unrelated based on self-reported pedigrees. Plucked hair samples were collected with informed consent for DNA extraction. This study was approved by Binghamton University's Human Subjects Research Review Committee.

DNA extraction

DNA was extracted from plucked hair using an abbreviated silica protocol32 previously described by Lum et al.33 Briefly, hair roots were washed with ethanol, rinsed with distilled water and incubated in 40 μl of silica and 900 μl of L6 buffer (4.5 M guanidinium thiocyanate, 0.1 M Tris–Cl (pH 6.4), 20 mM EDTA and 1.1% Triton X-100) at room temperature for 10 min. After centrifugation, the pellet was washed twice with 900 μl L2 buffer (5 M guanidinium thiocyanate, 0.1 M Tris–Cl (pH 6.4)), twice with 500 μl 70% ethanol and once with 500 μl acetone. After drying, DNA was eluted from silica in 1 ml Tris–EDTA buffer (10 mM Tris–Cl (pH 8.0) and 1 mM EDTA).

Mitochondrial HVSI sequencing and phylogenetic analysis

The mitochondrial HVSI was amplified using primers L15996 and H16401.34 PCR products were purified using the Millipore Manu03050 Filter Plate (Millipore, Billerica, MA, USA), and were sequenced in both directions with the BigDye Terminator Kit v3.1 on an ABI 3730xl DNA Analyzer (Applied Biosystems, Foster City, CA, USA). Sample sequences were compared and aligned to the revised Cambridge Reference Sequence35 using SeqScape v2.1.1 (Applied Biosystems).

Unique Moken HVSI lineages (n=2) were identified based on 355 base pairs (bp) of sequences (nucleotide positions (np) 16 024–16 378). Unbiased heterozygosity (h) was calculated using the equation h=n(1−∑xi2)/(n−1), where xi is the frequency of the ith haplotype and n is the number of individuals in the sample.36 Haplogroup designation for both Moken lineages was not unambiguously resolved based on HVSI polymorphisms alone, thus whole mitochondrial genome sequencing (see below) was carried out. For each of the Moken mtDNA HVSI haplotypes, a haplogroup-specific median-joining network diagram37 was constructed using Network 4.2.0.1 (http://fluxus-engineering.com) to illustrate the relationship of the Moken to their neighbors14, 15, 22, 24, 27 (Figures 2 and 3).

Figure 2
figure 2

A median-joining37 network diagram consisting of the major Moken hypervariable segment I (HVSI) (nucleotide positions 16024–16378) lineage (MKN1) and other haplogroup M21 sequences found in Southeast Asia (SEA).14, 15, 24, 27

Figure 3
figure 3

A median-joining37 network diagram consisting of the minor Moken hypervariable segment I (HVSI) (nucleotide positions 16024–16378) lineage (MKN2) and other SEA M46 sequences found in Southeast Asia (SEA).15

Whole mitochondrial genome sequencing and phylogenetic analysis

On the basis of the assumption that all individuals sharing identical HVSI haplotypes were likely to have inherited identical polymorphisms in the rest of the mitochondrial genome, whole mitochondrial genome sequencing was performed on two Moken samples bearing the two distinct HVSI haplotypes observed (see above). Twenty-four overlapping fragments of approximately 800 bp were amplified using primers and conditions described by Rieder et al.38 PCR products were purified and sequenced, and polymorphisms were identified as described above for HVSI. These sequence data appear in GenBank under accession numbers FJ442938–FJ442939.

Sequences for all whole mitochondrial genome sequences from Asia, Melanesia, Micronesia, Polynesia and South Asia of the Human Mitochondrial Genome Database39 and other regional sequences deposited in GenBank were obtained for initial analyses. Sequences were aligned using the MAVID/AMAP multiple alignment server.40 Phylogenetic analysis of the whole genome data was carried out using PHYLIP 3.6.41 Pairwise genetic distance between the two Moken mtDNA genomes and others from the region14, 16, 21, 23, 25, 26, 27, 28, 29, 31, 35, 42 were estimated using the Kimura two-parameter model with a transition to transversion ratio of 2:1 of the program DNADIST. The 50 most closely related sequences were then retained for further analyses. Bootstrap consensus phylogenies (1000 replicates) were generated for both whole mitochondrial genome sequences and for the coding region only using the SEQBOOT program. Consensus trees were then generated with the CONSENSE program using the extended Majority Rule method. The two consensus trees showed similar topologies, so only the tree generated from the coding region is shown in Figure 4. Although the structure of all nodes is shown, only bootstrap values greater than 50% are presented.

Figure 4
figure 4

Consensus phylogram illustrating relationships of Moken to Asian and Island Southeast Asian (ISEA) populations, generated from 1000 replicate bootstrap phylogenies of mitochondrial genome-coding region sequences. Original references are indicated for sequences lacking precise geographical identifiers: Ingman and Gyllensten25; Ingman et al.21; Maca-Meyer et al.23; Macaulay et al.27; Mishmar et al.26

Results

Despite including Moken individuals born on four islands spanning the range of the Mergui Archipelago, only two unique mtDNA HVSI lineages were observed (Table 1), resulting in low mtDNA haplotype diversity (h=0.167). The major lineage, MKN1 was shared by 11 of 12 Moken individuals surveyed and was found on all four islands sampled. The minor lineage, MKN2, was identified in only one individual, who was born in Dung. According to the nomenclature of Hill et al.,15 MKN1 belongs to mitochondrial haplogroup M21d, and MKN2 was tentatively assigned to M46.

Table 1 Polymorphisms identified in two Moken mitochondrial genomes

The relationship between MKN1 and other haplogroup M21 sequences found in SEA14, 15, 22, 24, 27 based on HVSI is shown in Figure 2 and Table 2. MKN1 was shared by one individual from Bali, and clustered with another from Bali (ISEA) and sequences from mainland SEA (Zhuang from Guangxi, Dai from Yunnan, northern Thai and Phuthai from northeastern Thailand).

Table 2 HVSI (nucleotide positions 16024–16378) sequence polymorphisms of the major Moken lineage (MKN1) and other SEA haplogroup M21 lineages14, 15, 22, 24, 27

The relationship between MKN2 and other M46 sequences, based on HVSI, is shown in Figure 3 and Table 3. Although MKN2 was not found in other ISEA populations, it was most closely related to the presumed ancestral lineage of sequences found in Borneo and Sumatra (Sundaland). In contrast to the relatively wide geographic range of haplogroup M21, M46 has been found almost exclusively in ISEA.15

Table 3 HVSI (nucleotide positions 16024–16378) sequence polymorphisms of the minor Moken lineage (MKN2) and other SEA haplogroup M46 lineages15

The Moken whole mitochondrial genome sequences revealed one previously unreported control region polymorphism and five previously unreported coding region polymorphisms; all of the latter are synonymous changes (Table 1). Phylogenetic analysis of whole mitochondrial genome sequences is shown in Figure 4. Unambiguous resolution of the affinities of MKN2 awaits more extensive sampling of regional mitochondrial genomes. There is support, however, for the clustering of MKN1 with a Semelai aboriginal Malay sequence, a reflection of the relationships between subclades of the M21 haplogroup.

Discussion

We identified two mitochondrial HVSI haplotypes among the Thai Moken. The low haplotype diversity (h=0.167) is consistent with previous analyses that describe relatively low diversities of both mtDNA sequences (0.009±0.007) and nuclear short-tandem repeats (0.650±0.030),33 moderate inferred gene flow with populations from the Philippines and low inferred gene flow with the Marianas and Borneo.43 This limited diversity is likely the consequence of genetic drift associated with historically small population size and exacerbated by the recent split with the Moklen. The Moklen have recently taken up permanent residence along the west coast of Thailand and diversified their resource acquisition to include coastal fisheries, pararubber tree plantations and daily wage labor. Despite contrasting subsistence patterns between the Moken and Moklen, the fission is considered recent because of mutually intelligible languages.5 The major legends and folklore of the Moken and Moklen, although reflecting important differences (for example, the Moken consider ‘Sipian and Gaman’ to be the legend portraying their origin, whereas the Moklen consider ‘Grandfather Sampan’ to be the legend of their origin), share similarities that further point to a recent split of the two groups. Ferrari et al.44 speculated that some Moken were captured as slaves to work on the mainland, and have since gradually absorbed local cultural traits and consequently distinguished themselves as Moklen.

Trade relations between the Moken and many other populations are documented and sex-biased marriages primarily between male mainland Asian traders and Moken women are suggested to have introduced characteristic glucose-6-phosphate dehydrogenase (G6PD) alleles into the population. The three variants observed link the Moken to Chinese and Burmese populations, and to other mainland Asian groups through the Thai.45 The relatively large G6PD diversity compared with mtDNA variation likely reflects both malaria-selective pressure and low female gene flow into the Moken. Despite these documented trade networks, Sather12 suggests that compared with other ISEA sea nomad populations, the Moken were largely self-sufficient in sea foraging and were thus less influenced by external markets of agriculturalists than other sea-dwelling populations. Frequent attacks by pirates and slave raiders have conditioned the Moken to avoid contact with unfamiliar outsiders as part of the survival strategy. In more recent times, the Moken pattern of semi-nomadic subsistence has become the focal point for discrimination and marginalization by the sedentary Thai majority.5 All of these factors would promote isolation consistent with the paucity of maternal genetic diversity we observe among the Moken.

The major Moken lineage was assigned to haplogroup M21d. M21 is an ancient (about 57 000 years old) yet localized haplogroup that appears only distantly related to other M types found outside SEA/ISEA.14, 27 Hill et al.15 suggest it might be traced back to the first anatomically modern humans who settled the region at least 45 000 years ago, with the derived subclades representing deep Upper Pleistocene ancestry. M21a is most common among the Semang, and M21b is found in both the Semang and Senoi. M21c has been identified in only two Semelai individuals.14 It is unclear whether haplogroup M21d is indigenous to ISEA. Our coding region phylogram reflects the relationships between these M21 lineages. MKN1 clusters with one of the M21c Semelai sequences, and the branch clustering the Semelai and MKN1 sequences with Semang Batek (M21a) and Jahai (M21b) sequences, although supported by a relatively low bootstrap value, defines the M21 haplogroup from others.

Because of these limitations in available whole mitochondrial sequence data, analyses based on more extensively sampled HVSI sequences are more useful in revealing affinities within the M21d haplogroup. HVSI analyses have identified M21d in several individuals from ISEA and South China and at high frequencies among the Burmese Moken. Hill et al.15 found frequencies of 0.9% among Melayu Malays (1 out of 109 samples), 2.4% in Bali (2 out of 82 samples) and 0.2% across the ISEA samples. One of these Balinese sequences was identical to our MKN1 HVSI sequence; the other differed at one nucleotide position. The MKN1 and Balinese sequences cluster with mainland SEA Dai and Thai HVSI sequences in our median-joining network diagram analyses.

Two different patterns of movement could explain these relationships. If the M21d haplogroup is indigenous to ISEA, it is possible that the pressure of rising postglacial sea levels prompted some populations to disperse from coastal areas and adopt a sea-based subsistence strategy, whereas genetically related groups chose instead to move to higher altitudes. The presence of the M21d haplogroup among the Moken then reflects dispersal from the coast, although its presence in Bali is a reflection of indigenous carriers who remained. Limited dispersal with early voyaging groups or genetic admixture through later trade relations is implied by the presence of the lineage among several individuals in ISEA.

The presence of the M21d lineage in South China, however, is not well explained by this pattern of movement. A more parsimonious explanation is that MKN1 and the related Balinese sequences share common ancestry in a coastal mainland SEA population. Both MKN1 and the related Balinese sequences cluster with the ethnic Dai of Yunnan province in the south. The Dai were originally distributed across southeastern China46 and trace their ancestry to the ancient Pai-Yuei tribe, which also contributed to the ancestral gene pool of the Thai.24 Genetic phylogenies constructed from microsatellite data clustered the Dai and neighboring southern Chinese groups with three Taiwanese Aborigine groups, reflecting a possible origin of early Austronesian dispersal.47 The Dai, Thai, Balinese and Moken might thus ultimately share ancestry among proto-Austronesian populations of coastal east Asia.

The ancestral population of the Dai and Thai then dispersed further inland, whereas the Moken and Balinese ancestral population likely dispersed coastally into ISEA. As temperatures rose in the late Pleistocene and early Holocene, the pressure of rising sea levels and of other expanding ethnic groups forced rapid movement of this population in two directions, toward both the Mergui and Balinese coasts. Genetic drift associated with low population size and isolation has pushed the M21d lineage to high frequencies among the Moken; at the same time, the complex history of population movement and genetic admixture documented in Bali48 have contributed to its relatively low frequency there. The absence of the M21d lineage among other populations in ISEA reflects the rapidity of dispersal, extreme genetic drift and a lack of adequate sampling given the high levels of diversity within the region.

Linguistic patterns suggest contact between the Moken and populations indigenous to the Malay Peninsula consistent with early trade relations. Although the Moken language is classified as Malayo-Polynesian within the Austronesian language family, it exhibits influences from Austroasiatic Mon-Khmer languages and limited, but striking, similarities to the Aslian branch of these languages.8, 49 The Austroasiatic languages originated in South China around 7000 years ago and today are spoken across SEA and parts of South Asia.11 The Semang, Senoi and Semelai Orang Asli speak Aslian languages (other Aboriginal Malays speak Malay dialects, which are in the Austronesian language family).14 The linguistic influences observed suggest contact between proto-Moken–Moklen speakers and early Mon-Khmer speakers of the Malay Peninsula, likely prompted by trade,49 which could coincide with dispersal from SEA toward ISEA and into the Mergui Archipelago at the end of the sea level rises.

The minor Moken lineage was tentatively assigned to the newly defined basal haplogroup M46, which is found almost exclusively in ISEA. M46 has an estimated age of 62 700±12 400 years and is postulated to be indigenous to ISEA.15 The affinities of our MKN2 lineage could not be ascertained unambiguously from our coding region phylogram, which again reflects limitations based on the number of whole mitochondrial genome sequences available, as well as the basal nature and relative rarity of the haplogroup. In multiple analyses, the MKN2 sequence clustered with low bootstrap support to others falling into various basal M haplogroups, including sequences from Chinese, Filipino, Melayu and South Indian (Koraga and Kannada) populations. HVSI analyses indicate that the lineage is most closely related to the presumed ancestral sequence of those currently found in Borneo and Sumatra. Its introduction to the Moken could have occurred early with the dispersal of Sundaland populations carrying the lineage at the time of sea level rises, or more recently through genetic admixture fostered by trade relations.

The sharing of Moken lineages with individuals from Bali and the clustering of these sequences with ethnic Dai and Thai populations are most parsimoniously accommodated by a recent coastal mainland SEA origin, dispersal into ISEA and rapid movement into the Mergui Archipelago with pressure from expanding ethnic groups and rising sea levels. Small-scale maternal gene flow from ISEA is suggested by the presence of the M46 lineage MKN2, thought to be derived from the first anatomically modern human settlers of the region. Low population sizes, as well as isolation driven by a history of slavery and marginalization, have contributed to considerable genetic drift, which is evident in the low mtDNA diversity among the individuals sampled. Within the past few decades, the Moken have experienced population declines and increasingly intense pressure to settle into permanent villages, which will further influence the population's genetic structure. Studies of Moken subsistence, language, traditions—and origins—are of even greater interest as the Moken struggle to maintain their lifestyle and identity in the face of rapid cultural change.