Introduction

Water buffalo (Bubalus bubalis) is one of the most important livestock species in several Asian countries and is used for the production of milk and meat and for draft power in rice cultivation. The domestic water buffalo in Asia is generally divided in two major subspecies, the dairy river buffalo and the draft swamp buffalo, which differ in morphology, behavior and number of chromosomes1, 2. The river buffalo is found in the Indian subcontinent, South Asia and the Mediterranean area (Italy, Egypt and the Balkans), and sporadically in Australia and South America, whereas the swamp buffalo is kept in Northeast India, China (southern regions and Yangtze valley) and Southeast Asia1, 3. Both types of water buffalo descend from the wild Asian buffalo (Bubalus arnee)4, which had a widely distribution range in eastern Indian, Sri Lanka and Southeast Asia until the beginning of XIX century5,6,7,8. Lau et al.7 hypothesized that the wild Asian buffalo originated in mainland of Southeast Asia and spread north toward China and west toward the Indian subcontinent, where the river type was probably domesticated.

Mitochondrial DNA (mtDNA), Y-chromosomal and nuclear microsatellite data showed a deep genetic divergence of swamp and river buffalo2, 5,6,7, 9,10,11,12,13,14,15,16, which indicate two independent domestications5, 17. Domestication of river buffalo most likely took place in the Indian subcontinent15, whereas swamp buffalo was proposed to originate from the border region between south China and north Indochina6, 17. However, the fragmentary buffalo mtDNA sequences from rDNA, COII, and cytochrome b loci reported to date5,6,7, 18, confound quantitative inferences of population history. So far two swamp buffalo mitogenomes (NC006295/AY702618 and JN632607) and one river buffalo mitogenome (AF547270) were deposited in GenBank. In this study, we report the mitogenomes from an additional 107 Southeast-Asian swamp buffaloes, covering most of its current geographic distribution with as outgroup one mitogenome from a Chinese river buffalo in order to establish the haplogroup phylogeny and reconstruct the demographic history of the swamp buffalo.

Results

Sequence variation of swamp buffalo mitogenomes

The 109 swamp buffalo mitogenomes with a length of 16340 to 16363 bps belong to 87 different haplotypes (Ht.s) and are divided into 21 haplogroups or subhaplogroups (Supplementary Dataset S1). The swamp haplotypes (Hd: 0.992) contain 362 polymorphic sites (π: 0.422) with a pairwise nucleotide difference of 69.0 ± 16.5 and a synonymous/non-synonymous ration of 4.63. Similar values have been reported for other livestock and human mitochondrial genomes19,20,21. Seventy-five swamp haplotypes are observed once, while the most frequent haplotypes HT22 and HT55 occurred five and seven times, respectively. Forty-eight haplotypes with 59 sequences belong to lineage SA and 34 haplotypes with 44 sequences belong to lineage SB. The remaining 5 haplotypes belong to the rare haplogroups SC (2), SD (2) or SE (1). The two river buffalo mitogenomes belong to clades R1 and R21, 6.

Swamp Buffalo Mitochondrial Phylogeny

A maximum parsimony (MP) tree based on the 111 water buffalo mitogenomes (89 haplotypes) confirms two distinct branches, river and swamp buffalo, which were separated by 300 substitutions (Supplementary Dataset S2). The river buffalo branch includes 2 haplotypes belonging to lineages R1 and R2. The remaining 87 swamp buffalo haplotypes cluster into five divergent haplogroups (Hg.s), namely SA, SB, SC, SD and SE, with an overwhelming representation of SA (54.1%) and SB (40.4%). We largely confirm previous phylogenies6, 10, but also reveal the novel haplogroups SA3 and SB4 and many different subhaplogroups. Lineages SA and SB are divided into three (SA1 to SA3; SA1′2 as an ancestral node) and four (SB1-SB4; SB2′3′4 as an ancestral node) sublineages, respectively. A single control-region transition at position 16066 defines a major (32.1%) star-like subclade SA1a. For the three rare lineages SC (1.8%), SD (2.8%) and SE (0.9%), the complete mtDNA sequences confirm that the split-off of lineage SC preceded the SA-SB divergence and that SD is a sister clade of SB, but indicate that also SE and SA are sister clades. Maximum likelihood (ML) and Bayesian evolutionary analysis of sampling trees (Beast) retrieved remarkably similar tree topologies (Supplementary Figs S1 and S2).

Molecular Clocks and Age Estimates

Previously reported estimates of the divergence time between river and swamp range from 10 Kya to 1.7 Mya2, 5, 7, 9,10,11, 13,14,15,16, 18, 22, 23, partially because different mtDNA segments were analyzed and different evolution rates were applied. In the present study, we phylogenetically compared 111 water buffalo mitogenomes with one African buffalo (Syncerus caffer; NC020617), while rooting our MP tree with one Bos taurus (V00654.1) and one ancient Bos primigenius (GU985279) mitogenome. The divergence pattern evaluated on all 114 bovine mitogenomes (Supplementary Fig. S3) confirms saturation of the D-loop divergence, emphasizing that sequencing of whole mitogenomes is essential for quantification.

We first considered the synonymous mutations for a ML estimation of the molecular age of phylogenetic nodes (Fig. 1). Using a fossil age estimate of the Bovini tribe of 8.8 Mya24 and the age (6.7 Ky) of the ancient Bos primigenius mtDNA25 we calculated a rate of 3.75 ± 0.47 × 10−5 synonymous substitutions per nucleotide per Ky for 3790 amino acid codons equalling 1 synonymous substitution every ~7.030 Ky. This yields divergence times of ~913 ± 78 Kya for the water buffalo mitogenome, ~82 ± 18 Kya for the only two river buffalo R1 and R2 haplotypes and ~232 ± 35 Kya for the swamp buffalo haplogroups (Table 1 and Fig. 1). At least 12 different mtDNA ancestral haplotypes of the current swamp buffaloes were present in Southeast Asia during the early Neolithic (11–6 Kya), which overlaps with the initial phase of domestication26. These haplotypes were the ancestors of the eight (sub)haplogroups SA1, SA2, SB1a, SB1b, SB2a, SB2b, SB3 and SB4, still common in modern herds, and of the rare SA3, SC, SD and SE.

Figure 1
figure 1

Phylogeny of complete mtDNAs from 111 buffalo mitogenomes. The topology was inferred by maximum parsimony (Supplementary Dataset S2). A maximum likelihood time scale, based on synonymous substitutions, is indicated below the tree. The standard error for the major nodes are represented by dot lines, further details are available in Table 1. Samples are indicated by 89 different haplotype IDs (Supplementary Dataset S1). The insert shows the geographic distribution of major haplogroups based on these complete mtDNAs. Samples from China have been divided into three regions (Yangtze Valley, Southwest China and Southeast China). The map has been drawn by hand in Adobe Photoshop (v. 8.0; http://www.adobe.com).

Table 1 Age estimates of major buffalo branches based on different mitochondrial datasets.

Time estimates were confirmed by (i) using all open reading frame mutations, (ii) considering all mutations partitioned in coding and control regions and evaluated with both ML and Beast (Table 1). However, slightly deleterious mutations within the open reading frames may lead to overestimations of younger clades (Supplementary Fig. S4)19, 21. The overall mutation rate was estimated at 2.11 ± 0.34 × 10−8 substitutions per nucleotide per year (1 mutation every ~2900 years) over the entire mitogenome. As for the river buffalo internal variation, a better estimate will be obtained by analyzing more mitogenomes.

Estimating Past and Present Demographic Trends

Bayesian skyline plot (BSP) of swamp buffalo mitogenomes shows three major changes in the effective female population size: i) a slight decrease between about 200 and 130 Kya; ii) a more recent decrease starting around 25–20 Kya and much steeper during the early Neolithic (11–6 Kya); and iii) a rapid increase from 3 Kya (Fig. 2). This recent increase explains the star-like topology of some haplogroups (e.g. SA1, SA2, SB1a, SB3 and SB4; Fig. 1) and is very similar to previously analysis of water buffalo and other bovine domestic species27.

Figure 2
figure 2

Bayesian skyline plot showing the swamp buffalo population size trend. The Y axis indicates the effective number of females, as inferred from our mitogenome dataset considering a generation time of six years49. The black solid line is the median estimate and the blue shading shows the 95% highest posterior density limits.

An analysis of the geographic distribution of swamp haplogroups in Southeast Asia, based on the control-region data currently available in previous literature6 or deposited in GenBank (Supplementary Table S1) confirms the prevalence of SA or SB mtDNAs (>99%) and a geographic differentiation of subhaplogroups with contrasting geographic distributions of the subhaplogroups SA2, SB1, SB2 and SB3 (Supplementary Fig. S5). The rare haplogroup SC was previously reported to occur in Thailand, Bangladesh and sporadically in Southwest China, while SD and SE were found only in Thailand6. We confirmed the presence of SC in the Southwestern Chinese Dehong population, but the haplotypes SC and SE were also found in the Yibin and Poyanghu breeds from the Yangtze Valley (inset in Fig. 1).

Discussion

We sequenced the complete mitogenome of the swamp buffalo in order to reconstruct the phylogenetic relationships of the mitochondrial haplotypes and to obtain a time scale for the phylogeny. Most of the major haplogroups were already identified in previous studies2, 5, 7, 9,10,11, 13,14,15,16, 18, 22, 23, but we recognized the novel haplogroups SA3 and SB4 and defined the subhaplogroups SA1a, SA1a1, SA1a2, SA1a3, SB1a, SB1a1, SB1a2, SB1b, SB2a, SB2b, SB3a, SB3a1, SD1 and SD2 (Supplementary Dataset S1 and S2).

During the glacial periods, drastic changes in ecological and climatic seasons had consecutive major effects on the distribution of plants and animals28, 29. As part of the Indo-Pacific Warm Pool, Southeast Asia was relatively warm during the Last Glacial Maximum (~26–19 Kya)30 with a temperature decrease of only 2.5 °C vs 5–10 °C globally31, 32. This made Southeast Asia a major global biodiversity hotspot33 in which many species survived (in glacial refuges) the fluctuations of temperature and forest coverage during the Pleistocene (for the latter, see Supplementary Fig. S6)28.

Estimations of divergence times are inherently imprecise34, but the accuracy of our data has been optimized by using data from the aurochs as internal calibration point, this in addition to a fossil age of the Bovini tribe of 7–11 Mya24. According to our estimate, the divergence of the swamp and river types of the water buffalo took place almost at the beginning of a glacial period (~900 to 860 Kya). Remarkably, from the swamp matrilineal diversity that must have formed until 200 Kya only the minor haplogroups SC and the ancestor of all other haplogroups (SA′B′D′E) have survived. We propose to divide the last 200 ka into five phases, correlating glacial periods and estimates of demographic/phylogenetic history (Figs 1 and 2 and Supplementary S6).

  1. 1.

    During 2nd Pleistocene Glacial Period (~200 to 130 Kya) the first decline of population was observed in the BSP while two macro-haplogroups (SA′E and SB′D) diverged (Figs 1 and 2).

  2. 2.

    The first phase of the last Pleistocene glacial period (~110 to 50 Kya) was still comparatively moderate, the population size remained almost unchanged and only one divergence event (SA-SE) has been identified.

  3. 3.

    During the second phase of the last Pleistocene glacial period (~50 to 11 Kya) the population began to decline. The current demographic composition of swamp populations shows that ~99% of current mitogenomes are derived from only two ancestral haplotypes (SA1′2 and SB), both dated around the LGM (~26–19 Kya). Afterwards, the major haplogroups SA1′2 and SB differentiated into 8 haplotypes (SA1, SA2, SB1a, SB1b, SB2a, SB2b, SB3 and SB4).

  4. 4.

    After 11 Kya the increasing temperature raised the sea level, which had a profound impact in the regions from Sundaland to Southeast China35. Paleoenvironmental data on the Holocene indicate a warm period between 11 and 6 Kya in southern China, known as the early Holocene optimum36. This overlapped with the first phases of the rice cultivation, which is believed to have triggered the domestication of the swamp buffalo at 7–3 Kya and an expansion of the water buffalo population1, 6, 10, 14, 26, 37.

  5. 5.

    Finally, we observe a rapid increase since 3 Kya to the present population size. As previously proposed27, this was due to expansion of the domestic buffalo to the large current distribution range, harboring the several present populations with distinct haplogroup distributions.

Thus, the demographic history of swamp buffalo seems to be linked to the historic glacial events, establishing isolated refugia of swamp buffalo in which only a limited number of subhaplogroups survived. This may explain why only one divergence event (SA-SE) has been dated to the period of 180–40 Kya. The current demographic composition of swamp populations suggests that ~99% of current mitogenomes are derived from only two ancestors (SA1′2 and SB), which are both dated around the Last Glacial Maximum (~26–19 Kya). The time estimates in Fig. 1 and Table 1 further indicate that also the divergence of the current domestic subhaplogroups SA1, SA2, SB1a, SB1b, SB2a, SB2b, SB3 and SB4 preceded domestication, which gives 8 haplotypes as a minimum estimate of the swamp buffalo diversity captured by the first farmers.

Our account of the pre-domestic history of the swamp buffalo provides a context to previous studies of the diversity of mtDNA control region and cytochrome b gene6, 10, which has been summarized in Supplementary Fig. S5. The high diversity of domestic SA and SB haplotypes in the China-Vietnam border region was proposed as evidence supports an initial major domestication event of swamp buffalo in Southeast Asia, probably between southern China and Vietnam. The finding of SC, SD and SE haplotypes almost exclusively in Thailand and Bangladesh suggests incorporation of these haplotypes after the domestic buffaloes had reached the west bank of the Mekong river6 in a scenario of recurrent restocking the domestic population with wild females as proposed previously for the horse20, 38, 39. Extending the analysis to other loci or even to the entire genome and also a wider sampling covering the entire geographic range of the swamp buffalo is desirable to unravel further the domestication and subsequent demographic history of swamp buffalo and to enable an interesting comparison with the related river buffalo6.

Methods

Sample Collection

Most of the samples used for this work were already collected from previous collaborative works5, 10. All samples were already classified into mtDNA haplogroups based on control-region data. We selected 107 mtDNA for complete sequencing in order to represent all swamp lineages and to include the highest possible molecular variability avoiding potential redundancies. An additional mitogenome from river buffalo was sequenced to be used as an outgroup.

Mitogenome sequencing

DNA was extracted10 from 107 swamp buffaloes (blood, ear tissue and hair follicle) from China (46), Laos (23), Myanmar (3), Thailand (14), Vietnam (16) and India (5) and one Chinese river buffalo. Complete mitogenome sequences were obtained by using two different approaches: 1) PCR amplification (with 27 primer pairs, Supplementary Table S2A)40 and Sanger sequencing; 2) Long-Range PCR amplification (Supplementary Table S2B) and Illumina sequencing41. GenBank accession number, sequencing method, coverage and depth of each sample are reported in Supplementary Dataset S1. MtDNA genome sequences were analyzed using DNASTAR 7.0, Sequencher v5, DNAsp v5, Clustal X and GeneSyn packages.

All experimental procedures were performed in accordance with the Regulations for the Administration of Affairs Concerning Experimental Animals approved by the State Council of People’s Republic of China. The study was approved by Institutional Animal Care and Use Committee of Northwest A&F University (Permit Number: NWAFAC1019).

Phylogeny Construction and Demographic Inferences

The phylogeny construction was performed following a maximum parsimony (MP) criterion by hand and confirmed using an adapted version of mtPhyl4.01542, as previously described20, 21, 43, 44. The modified.txt files to be loaded in the program are available upon request. The tree was rooted on the Bos taurus reference sequence (V00654.1) and on the ancient Bos primigenius mtDNA (GU985279). A maximum likelihood (ML) tree was computed using MEGA7.045 with 1000 bootstrapping replicates.

A first ML analysis was performed using PAML X46 by considering only synonymous mutations in the protein coding genes. The ND6 gene was reverse-complemented to present the same reading direction as the other genes and non-synonymous substitutions were replaced with the ancestral base pairs. Stop codons were excluded from the analysis. Total lengths of coding genes were joined together and the final alignment (11370 bps/3790 codons long) was analyzed with CODEML to calculate a synonymous mutation rate. A second tree was calculated in the same way, but considering all coding mutations. The third and fourth trees with molecular ages were calculated by PAML X and BEAST v. 1.8.3 software47, respectively, while considering two partitions in the molecule corresponding to the coding (including all genes coding for mRNA, rRNA and tRNA) and control regions. Modelgenerator v.85 indicated for our dataset HKY + G + I as the best-supported model according to the AIC2 and BIC criterions. These substitution and site heterogeneity models with 8 gamma categories – the lowest number significantly increasing (>1.0) the likelihood – were selected for the subsequent ML and BEAST estimates. The generalized likelihood ratio statistic was always used to verify the clock hypothesis. In order to calibrate the molecular clock, we built a Bovini tree by including one African Buffalo (NC020617) and two Bos mitogenomes (one Bos taurus, V00654; one ancient Bos primigenius, GU985279) used as an outgroup. For the calibration point we used the estimated archaeological age of the Bovini tribe (8.8 ± 1.1 My; 95% CI: 7–11 My)24. Since multiple calibration points are preferable24, the age of the ancient Bos primigenius (6.7 ± 0.2 Ky; 95% CI: 6.3–7.1 ky) was also used as an internal (recent) calibration point. The major haplogroups were considered as monophyletic in order of being able to calculate their age estimates. The analyses were also repeated excluding the aurochs sequence, but the estimates changed by only ~4% on average. We then obtained a Bayesian skyline plot (BSP)48 from the swamp buffalo phylogeny by running 50,000,000 iterations with samples drawn every 10,000 steps. We constructed spatial frequency distribution plots with the program Surfer 9 (Golden Software, http://www.goldensoftware.com/products/surfer) by using the control-region data currently available in previous literature6 or deposited in GenBank (Supplementary Table S1).

Data accessibility

Sequences of the novel water buffalo mitogenomes have been deposited in GenBank under accession numbers KX758295 - KX758402 (108 complete mtDNAs).