Introduction

A majority of the Mexican population is considered to be mestizo, which could be defined as a person who was born in Mexico, has a Spanish-derived last name and has a family of Mexican ancestors belonging to the third generation.1 Mestizos originated from the admixture among the Spaniards, native Amerindians and Africans about 500 years ago.2, 3 Mexico City and its metropolitan regions are located in the valley of Mexico, which is situated in central Mexico. This geographical area was the heart of the Aztec empire and the most populated region of Mesoamerica before the arrival of the Spaniards.4 In addition, a large amount of indigenous populations lived in the southern region of Mexico in well-established and populated Mesoamerican cities.5, 6, 7 In contrast, indigenous populations settled in the northern region of Mexico comprised few individuals who resided in small communities close to the rivers, and most of them were nomads and hunters.4, 8 After the conquest was concluded, most Spaniards established in Mexico City and a large amount of the indigenous populations was displaced to small villages around the city in the valley of Mexico.5 During the following 300 years, colonial Mexico received a continuous flow of immigrants coming from Europe, in particular Spain, and to a lesser extent black slaves from Africa in the sixteenth century, who were established essentially in the coastal region.9 In this period, the Spaniards expanded to the northern and southern regions of New Spain and founded the colonial cities, which were particularly located in areas that were rich in mineral reserves.10 According to historical data, estimation of admixture using classical nuclear genetic markers has shown that European, Amerindian and African ancestries vary across the geographical regions of the country.11 Amerindian ancestry predominates in all regions, whereas the proportion of European and African ancestry is variable. There is a higher proportion of individuals with Caucasoid ancestry in Mexico City (35–71%) and the northern regions (35–55%) than in the central regions (21%) around Mexico City and the southern regions (4.2–43%) of Mexico, whereas the African ancestry predominates in the coastal regions (22–41%).12 On the other hand, European ancestry, explored using mitochondrial DNA (mtDNA), was absent in the indigenous population13, 14, 15, 16 and was very low in the mestizos either from the north (5.3%)17 or from Mexico City (3.2%).12, 18 Although those Mexican mestizos come from modern populations, the fact that mtDNA is maternally inherited19 without recombination indicates that the mestizo mitochondrial genome has initially been inherited from indigenous women. This agrees with historical data because mating has been described essentially between Spaniard men and indigenous women.20, 21 As mtDNA has a higher mutational rate than the nuclear genome,22 the former has been used as a tool in population genetics for making inferences and assumptions in ancestry and the migration patterns of human populations.23 Furthermore, the Amerindian mtDNA has been well characterized and the haplogroups found in indigenous populations in the Americas were the first identified and named A, B, C, D and X.15, 24, 25 Those Amerindian haplogroups were recently termed A2, C1, B2 and D1, once there were more available data of the complete mtDNA genome sequence.26, 27, 28 As most studies that analyzed mtDNA polymorphism in the Mexican population used mainly restriction fragment length polymorphisms (RFLPs),13, 17, 18, 29 and there are no available data from full control region sequences, the knowledge of mtDNA diversity is limited and it has not been extensively analyzed.

To describe the genetic variability in the mtDNA control region present in the Mestizo population, the hypervariable region I (HVR1) and hypervariable region II (HVR2) were analyzed in 270 unrelated Mexican mestizos. In addition, the mestizo population from Mexico City was compared with Mestizo populations from the central and coastal regions of Mexico. Besides the utility of the present knowledge for forensic purposes and population genetics, mtDNA in conjunction with other genetic markers could be helpful to estimate the individual ancestry proportion in admixed populations. These ancestry estimates will be useful to match adequately the cases and controls in genetic association studies and to avoid the bias caused by population stratification.30

Materials and methods

Samples

Blood samples were collected from 270 unrelated healthy women attending the colposcopy screening program at the General Hospital of Mexico. All women were born in Mexico and had a Mexican ancestry for two generations. From the 270 individuals, 100 were born in Mexico City, 115 in the central, 47 in the coastal (8 from the Golf coastal and 39 from the Pacific coastal) and 8 in the northern regions of the country (Figure 1). All the women accepted to participate in the study and signed a written informative consent. The protocol and procedures in this study were reviewed and approved by the local ethics and scientific committee of the General Hospital of Mexico.

Figure 1
figure 1

Geographical location of individual birthplaces. Samples were grouped into those individuals who were born in Mexico City (black arrow; n=100), in the coastal (Veracruz, Michoacán, Guerrero, Oaxaca, Chiapas; n=47), northern (Jalisco, San Luis Potosi, Zacatecas, Durango, Sinaloa; • n=8) and central regions excluding Mexico City (Estado de México, Hidalgo, Puebla, Tlaxcala, Morelos, Queretaro, Guanajuato; n=115).

DNA extraction and sequencing

Whole genomic DNA was extracted using a commercial DNA extraction purification kit (Qiagen, Valencia, CA, USA) according to the manufacturer's protocol. The full control region (1261 bp) was amplified by PCR from 500 ng of whole genomic DNA in a 25 μl reaction containing 2 mM MgCl2, 1 × PCR buffer (50 mM KCl, 10 mM Tris-HCl and pH 8.3), 200 μM of each of dATP, dCTP, dGTP and dTTP, 1 μM of each primer and 1 U of Taq Platinum DNA polymerase (Invitrogen, Carlsbad, CA, USA). Reactions were carried out in a GeneAmp PCR System 2720 (Applied Biosystems, Carlsbad, CA, USA). After denaturing for 2 min at 94 °C, 32 cycles were performed by setting station 1 at 94 °C for 45 s, station 2 at 60 °C for 1 min and station 3 at 72 °C for 2 min. Finally, the PCR reaction was kept at 72 °C for another 7 min. DNA strands of both HVR1 (position 16024–16365) and HVR2 (position 73–340) were sequenced using the fluorescent cycle-sequencing method (BigDye Terminator Ready Reaction Kit, Applied Biosystems). Sequence analysis was performed using an ABI PRISM 3130 genetic analyzer system (Applied Biosystems). The sequence and position of the forward (F) and reverse (R) primers for PCR and sequencing are as follows:

L-15926 (5′-ATACACCAGTCTTGTAAACC-3′, position 15 907–15 926),

H-00580 (5′-TTGAGGAGGTAAGCTACATA-3′, position 599–580),

H-16410 (5′-CACGGAGGATGGTGGTCAAG-3′, position 16 414–16 395),

L-00015 (5′-CACCCTATTAACCACTCACG-3′, position 15–34) and

H-00383 (5′-AGGCTGGTGTTAGGGTTCTT-3′, position 383–364).

Data analysis

The mtDNA HVR1 and HVR2 sequences of each individual were compared with the revisited Cambridge Reference Sequence (rCSR)31 using the Applied Biosystems Seq Scape ver. 2.5 Software. The polymorphisms identified in each sample were stored in a Microsoft Office 2007 Access database to construct and identify the haplotypes, and calculate the allelic and haplotype frequencies. Calculations of the mean number of pairwise differences, nucleotide diversity, haplotype diversity indexes, mismatch distribution, Thau (τ) and Tajima's D neutrality test were performed using the population genetics software package arlequin ver. 3.11.32 The motifs present in both regions that characterize the Amerindian,27, 28, 33 European34, 35 and African36 mitochondrial haplogroups were used to classify each sample by their haplogroup. The statistical parsimony algorithm was applied to construct a minimum spanning network with control region sequences using the TCS ver. 1.21 software.37, 38 Positions not adjusting to an infinite sites model because of the presence of mutation heterogeneity in both regions (16 051, 16 093, 16 126, 16 129, 16 172, 16 187, 16 189, 16 223, 146, 150, 152, 182, 189, 195) were removed from the analysis to reduce the degree of homoplasy.39 Although those sites were removed, not all the reticulations were eliminated in the network, and the remaining loops were solved using the algorithms based on the coalescent theory.40 To evaluate the significance of differences in the proportion of mitochondrial haplogroups ancestry among the groups, a χ2 test was performed. The significance of comparison of the mean pairwise differences and the nucleotide diversity was evaluated using the Student's t-test. All statistical tests were carried out using the Minitab statistical software.

Results and discussion

From 610 nucleotides analyzed, a total of 162 substitutions were found in 154 polymorphic positions (102 in HVR1 and 52 in HVR2). A total of 202 different haplotypes were identified in the whole sample (n=270). The number of polymorphic positions present in the haplotypes ranged from 1 to 16, with a mean of 8.9. In all, 168 haplotypes (83.1%) were identified only once and the most frequent haplotype (haplotype Mx7) was present in 15 (5.6%) individuals (see Table 1). It is noteworthy that most haplotypes (192; 95%) have not been reported earlier (Table 1).27, 39, 40, 41, 42 The haplotype diversity of the whole control region was 0.9945, with a mean number of pairwise differences of 11.14 and a nucleotide diversity of 0.0655 (see Table 2). Unlike HVR1, HVR2 had 15 sites that showed indels either as a single-point insertion (insertion at 180G) or as a part of polymorphic microsatellite (p248–249, p286–291, p303–309 and p311–315) and non-microsatellite (p106–111) tracts. In the poly-cytosine tract p303–309, only 26% of individuals showed seven cytosines like the rCSR, whereas most individuals (54.8%) had eight cytosines and 18.8% had nine cytosines (see Table 3). Interestingly, in the other microsatellite in 311–315, all but one individual had six cytosines, one more cytosine than in the rCSR. Both homopolymeric cytosine stretches are known to be highly variable;43 however, in our sample only the microsatellite p303–309 was polymorphic. The frequency of alleles in the track p303–309 in our sample was similar to the frequencies reported in other populations.39, 44, 45, 46, 47, 48 The deletion of two adenines in the poly-A track at positions p286–291 was always linked with the deletion of one adenine at position p248–249 and this combination was found in 17.1% of the individuals examined, whereas the remaining individuals (82.9%) had six adenines in the first and two adenines in the last positions similar to the rCSR. A deletion of six nucleotides from position 106 to 111, which has been described earlier in the Chibchan Amerindian population,49 was found in four individuals. The insertion of guanine nucleotide in position 180 was found only in one individual.

Table 1 Haplotypes found in the 270 individuals
Table 2 Genetic diversity indexes calculated for the whole sample and Amerindian and European haplogroups
Table 3 Allele frequencies of microsatellites loci in the tracks 303–309 and 311–315 from hypervariable region II

Mitochondrial haplogroups

In the whole sample, 252 individuals (93.3%) were classified to one of the Amerindian haplogroups. The Amerindian haplogroups found were A2, B2, C1 and D1, with a frequency of 51.1, 17.8, 18.5 and 5.9%, respectively (Figure 2b). The frequency of the non-Amerindian haplogroups found in the sample was 6.7%, of which European haplogroups represented 5.9% (16 individuals) and the African haplogroups 0.7% (2 individuals). The Amerindian ancestry proportion found in this sample is slightly higher but not significantly different from that earlier reported in a study carried out in the mestizo population from Mexico City (90.5%), where haplogroups were identified by RFLPs.18 In comparison with that study, which included persons with social security, individuals enrolled in this study included women who attended the Pap screening program at the General Hospital of Mexico, which regularly attends people with lower income lacking social security,50 and it has been suggested earlier that the proportion of Amerindian ancestry could be higher in people coming from low socioeconomic strata than in those with a higher-level income in the Mexican mestizo population.11, 51

Figure 2
figure 2

Mitochondrial haplogroups in the Mexican mestizo population. (a) Minimum spanning network constructed with mitochondrial DNA (mtDNA) control region sequences from the Mexican mestizo population. Colored circles in the network show the haplotypes within each haplogroups and their relationships; the circle sizes are proportional to the haplotypes frequencies. Black circles represent inferred haplotypes and the black arrow shows the rCSR localization in the network. The main mutational steps are depicted. (b) Mitochondrial haplogroup frequency found in the total sample; the colors of the haplogroups are the same for both figures.

Amerindian mitochondrial haplogroups

Haplogroup A2

The polymorphisms that characterize this haplogroup are the transitions at positions C16111T, T16223C, C16290T, G16319A, T16362C, T146C, A153G and A235G.27, 28 In addition to these characteristic polymorphisms, 50 haplotypes, including the haplotype Mx7, showed the transitions A73G and A263G (Table 1). As most reported studies analyzed only the HVR1, the positions of this region were compared with reported sequences. HVR1 polymorphisms found in haplotype Mx7 (11.1% of all haplotypes when only HVR1 were considered) have been reported earlier in other studies but in lower proportions (5–8%).39, 40, 41, 42 From 93 haplotypes that composed the haplogroup A2, 77 had a thymine at position 16111 and only 16 had the cytosine similar to the rCSR. Before the classification suggested by Bandelt,26 haplogroup A was split into A1 and A2 by the transition at this position.33 However, further studies reported that the C allele at position 16111 in the A2 haplotypes was instead a back mutation to the rCSR from T back to the C allele, because of the high mutation rate present in the mtDNA control region.28 Transition at position C16111T is characteristic of the Amerindian A2 haplogroup because it is rarely found in Asian haplogroup A sequences.52, 53 Furthermore, this C16111T polymorphism has rarely been seen in other haplogroups.54, 55 Substitutions A153G and A235G were highly frequent in the haplogroup A2, with a frequency of 88.4 and 94.2%, respectively. Although these positions characterize the A2 haplogroup, they were rarely found in other Amerindian (C and B) and non-Amerindian haplogroups (H) in this sample. Interestingly, the transition T152C was found only in 17.8% of the A2 haplogroup sequences, whereas in Asian haplogroup A sequences it was found in 16 out of 18 sequences.56

Haplogroup B2

This haplogroup was present in 48 of the 270 individuals analyzed and it was the third most common of the Amerindian haplogroups found in this sample. The HVR1 motifs that are characteristic of haplogroup B2 are the transitions T16189C and T16217C.27, 52 However, other positions associated with B2 haplotypes are the transversions A16182C and A16183C.28, 39 The haplotype Mx101 was the most frequent from haplogroup B2 (four individuals) and included also the polymorphisms A16182C, A16183C, T16189C, T16217C, A73G and A263G (Table 1). Considering only the positions at HVR1, this haplotype has been reported earlier once in Latin America.40, 41, 42 Interestingly, the polymorphism A16183C was identified in all haplogroup B2 sequences. The haplogroup B2 in this study was further divided in two subgroups, based on transversion A16182C, which was found in 54.1% of the B2 haplotypes. These data highly contrast with an earlier study in the Mexican mestizo population,17 in which transversions A16182C and A16183C were not found in 33 individuals that had the HVR1 region analyzed. On the other hand, transversion 16183 has already been reported in all B haplotypes analyzed in Latin America and almost all from Asia, whereas transversion A16182C was absent in the Argentinean population42 and it was found in low frequency in Colombia and Brazil (25%).40, 41, 42, 53 The polymorphism A16182C is widely distributed among Asian B haplogroups, mainly in the subhaplogroups B4a and B4b.53, 55, 56 In fact, it has been suggested that the Amerindian haplogroup B2 descended from the haplogroup B4b.26, 57 According to the updated phylogeny, B haplotypes having the transversion A16182C belong to subhaplogroup B2c.28

Haplogroup C1

Haplogroup C1 was present in 50 (18.5%) individuals of the total sample and was the second most common haplogroup. Earlier reports have shown that the frequency of haplogroup C1 varies across distinct native American ethnic groups in Mexico, having a frequency similar to haplogroup B2.16, 29 C1 haplogroup is defined by polymorphisms C16223T, T16298C, T16325C, C16327T, A73G, A263G, and the adenine deletions at positions 249, 290 and 291 of HVR2.26, 27, 28 The haplotype Mx144 had the characteristic polymorphisms of C1 haplogroup and it was the most commonly detected (five individuals; Table 1). Considering only the HVR1, this haplotype has been reported earlier in populations from Mexico and Latin America.17, 40 Although the adenine deletions in the poly-A tracts of HVR2 are characteristic of haplogroup C, they had been rarely reported in other haplogroups.40, 46 For instance, in Asian populations these deletions have been found in the haplogroups A, F and R9 instead.53, 57 In contrast, in the haplogroups C4c and L2b the deletion of one adenine at position 248–249 is not accompanied by deletion of two adenines at positions 290–291.27, 40 Interestingly, the transition A16051G, which has been associated with haplogroup C1d, was found in 32% (16 individuals) of the C1 haplotypes in this sample.

Haplogroup D1

This haplogroup was the least frequent Amerindian haplogroup and it was found in only 16 (5.9%) individuals. The HVR1 polymorphisms characteristic of this haplogroup are transitions C16223T, T16325C and T16362C.27 Mx174 was the most frequent D1 haplotype found (three individuals) in the sample and had the characteristic polymorphisms plus the transitions A73G and A263G (Table 1). Considering only HVR1 polymorphisms, this haplotype has been found earlier in South America.41 The haplogroup D1 belongs to the branch of Asian haplogroup D4; however, the transition at position 16 325 is not found in D haplogroups from Asia.53, 57 The frequencies of haplogroup D1 in Amerindian populations are lower in Central America and higher in North and South America.15, 16, 58

Genetic diversity of mitochondrial Amerindian haplogroups

Molecular diversity indexes were calculated for each Amerindian haplogroup (see Table 2). Interestingly, although haplogroup A2 was the most frequent in the whole sample, it had a significantly lower mean number of pairwise differences (4.7) and nucleotide diversity (0.027) compared with the B2 (5.8; 0.034) and C1 (5.9; 0.034) haplogroups (P<0.01, Student's t-test). The genetic diversity indexes of haplogroup A2 were higher than those of the haplogroup D1, but they were not significant (Table 2). The mismatch distribution was plotted for each Amerindian haplogroup and all haplogroups showed a unimodal distribution (data not shown). Moreover, the Tajima's D values calculated for haplogroups A2, B2, C1 and D1 were significantly negative (see Table 2). A significantly negative Tajima's D value with a unimodal mismatch distribution also suggests a demographic expansion.59, 60 τ calculated for haplogroups A2, B2, C1 and D1 was 2.7, 6.1, 5.1 and 1.8, respectively. Using τ and the mutation rate estimated for mtDNA control region (HV1R: 1.64 × 10−7 and HV2R: 2.29 × 10−7),61 the time to demographic expansion calculated for the haplogroups A2, B2, C1 and D1 was 11.6, 25.9, 21.6 and 7.9 kya before present, respectively. Therefore, the lower genetic diversity present in the haplogroups A2 and D1 could be explained by a recent expansion of these haplogroups in the Mexican mestizo population.62 It is important to emphasize that these calculations in the present study were performed using data from the control region, which has a higher mutational rate compared with the coding region.63

Non-Amerindian haplogroups

European haplogroups present in the sample (16 individuals) were H (1.9%), J (0.7%), K (0.7%), T (0.7%), U (1.9%) and V (0.4%). Haplogroup H frequency is similar to that earlier reported for Mexico City.18 Interestingly, this haplogroup is the most frequently found in Spain.34, 35 The lower frequency of European mitochondrial haplogroups in Mexico is related to the rare mating between European women and Amerindian or mestizo men through Mexican history.2 The two individuals with African haplotypes belong to haplogroup L3. The frequency of African haplogroups found in the sample was lower than that found earlier (4.4%).17

Minimum spanning network topology of mitochondrial haplogroups

The individual sequences that were earlier classified to one of the mitochondrial haplogroups by the control region motifs were represented in the minimum spanning network (Figure 2a). It is clearly seen that individuals with the same haplogroup are grouped. Therefore, the network grouped Amerindian haplotypes into the four main haplogroups (A2, B2, C1 and D), whereas the non-Amerindian haplotypes were separated into two different groups, one for the Europeans and the other for the Africans. All four Amerindian haplogroups represented in the network had a star-like pattern with a main internal node connected to other less frequent haplotypes (Figure 2a). This observed star-like pattern has been described earlier in Amerindian haplogroups28, 59 and also suggests population expansion.64 The A2 haplotypes were grouped into three main internal nodes. Two nodes have the transition C16111T but are separated by transition A153G. The third node includes those haplotypes with the back-transition T16111C. Haplogroup B2 is represented in the minimum spanning network by two main internal nodes with multiple connections to other haplotypes in star-like topology. Both nodes were separated by transversion A16182C. It is noteworthy that B2 haplotypes are closely related to the European H haplogroup because both are derived from the ancestral R haplogroup, which belongs to the macrohaplogroup N.65 The C1 and D1 haplotypes had only one main internal node connected to other singletons.

Distribution of mitochondrial haplogroups by geographical region of birthplace

There was no significant difference in Amerindian ancestry among the individuals grouped by birthplace region (see Table 4). However, the sample of Mexico City (91.0%) and coastal region (91.6%) showed the lower frequency of Amerindian ancestry, whereas the sample of the central region had the higher percentage of Amerindian ancestry (95.7%). This agrees with the fact that most of the extant indigenous communities, such as the Nahuatl, Mazahua and Otomi, are concentrated in the central region of Mexico.66, 67 The frequency of haplogroup A2 was similar in all regions; however, the transition C16111T was significantly more frequent in A2 haplotypes of Mexico City (94.2%, 49 out of 52) than in the central region (75.4%, 43 out of 57; P<0.01, χ2). It is noteworthy that the frequency of the haplogroup B2 was significantly higher in the sample of the central region compared with Mexico City and the coastal region (P<0.01, χ2). This agrees with the higher frequency of B haplotypes in some indigenous populations from the central regions, such as Nahuatl (27–40%), or from the south, such as Mixe (33.3%).13, 15, 16 The data also suggest that the higher frequency of Amerindian ancestry in the central region is accounted essentially by the increased frequency of B haplotypes. The frequency of the transversion A16182C in B2 haplotypes was more frequent in the central region (55.5%, 15 out of 27 individuals) than in Mexico City (41.6%, 5 out 12 individuals) B2 haplotypes, but this difference was not significant. In contrast, the haplogroup C1 was more frequent in individuals from Mexico City (21%) than from the central (15.6%) and coastal regions (17%), but this difference was not significant. The lower frequency of C haplotypes in the central region agrees with the low frequency of this haplogroup in indigenous populations from the valley of Mexico.13 In contrast, half of the eight individuals from the north region had this haplotype, which agrees with the high frequency of this haplogroup in indigenous populations from the north, such as the Huichol (41%) and Tarahumara (38%) populations.13, 15, 16 The frequency of haplogroup D1 was similarly low in all geographical regions analyzed. More than a half of the individuals with European haplogroups were concentrated in Mexico City, which is consistent with differences in the matrilineal ancestry in the country owing to the historical fact that European women established mainly in Mexico City and larger colonial cities.21

Table 4 Mitochondrial haplogroups frequencies according to the geographical region of birth place

Conclusions

The present work is the first survey to analyze the full mitochondrial control region in the Mexican mestizo population. Although the haplotype diversity was similarly high in other populations, most of the haplotypes identified have not been reported earlier. A higher frequency of Amerindian ancestry was found in individuals from the central region of Mexico than in those from Mexico City and the coastal areas. These differences could be because of a heterogeneous settlement of indigenous communities in the country, with a higher concentration of these communities in the central region. Mexico City has higher frequency of European haplogroups than other regions, in agreement with the historical fact that the majority of European women established in this city. The Tajimas's D neutrality test performed in this sample and the topology of the minimum spanning tree could suggest a demographic expansion of all Amerindian mitochondrial haplogroups in the Mexican mestizo population. The τ values calculated for these Amerindian haplogroups showed that the time to expansion was recent for haplogroups A2 and C1. The present knowledge will contribute to future forensic, evolutionary and demographic studies in Mexican mestizos. These data, in addition to other genetic markers, will be useful to estimate the ancestry proportion of the subjects in association studies to better select cases and controls and avoid the bias caused by population stratification.

Conflict of interest

The authors declare no conflict of interest.