Genomic continuity of Argentinean Mennonites

Mennonites are Anabaptist communities that originated in Central Europe about 500 years ago. They initially migrated to different European countries, and in the early 18th century they established their first communities in North America, from where they moved to other American regions. We aimed to analyze an Argentinean Mennonite congregation from a genome-wide perspective by way of investigating >580.000 autosomal SNPs. Several analyses show that Argentinean Mennonites have European ancestry without signatures of admixture with other non-European American populations. Among the worldwide datasets used for population comparison, the CEU, which is the best-subrogated Central European population existing in The 1000 Genome Project, is the dataset showing the closest genome affinity to the Mennonites. When compared to other European population samples, the Mennonites show higher inbreeding coefficient values. Argentinean Mennonites show signatures of genetic continuity with no evidence of admixture with Americans of Native American or sub-Saharan African ancestry. Their genome indicates the existence of an increased endogamy compared to other Europeans most likely mirroring their lifestyle that involve small communities and historical consanguineous marriages.


Results
Identity-by-descent, identity-by-state, and multidimensional scaling analysis. Analyses of identity-by-descent (IBD) values were carried out in order to detect close relationships between individuals that could not be detected based on the genealogical information provided by the donors. This analysis allowed the identification of a few cryptic relationships (Fig. 1A). Figure  F-inbreeding coefficients (fixation indexes) were obtained for Mennonites and other reference European samples from the 1000 Genome Project (hereafter 1000G). Mennonites show the highest fixation index when compared to Europeans (Fig. 1B), and this difference is statistically significant (Mann-Whitney test, P-value < 0.001).
The Multidimensional-scaling (MDS) plot based on identity-by-state (IBS) values ( Fig. 2A) displays three main clusters located at the vertices of a triangle, and representing the three main continental groups, Europe, Africa and Asia. While Dimension 1 (accounting for ~15% of the variation) separates African profiles from European and Asian ones, Dimension 2 (~6%) separates Asian from European and African profiles. A number of samples from different population groups fall in between these three main clusters, reflecting the existence of two or three-side admixture. Mennonite variation falls perfectly within the European cluster, with no evidence of admixture with other continental groups. Within Europe, Mennonites cluster together with the CEU and the GBR samples (Fig. 2B).
The dendogram in Fig. 2C identifies the Mennonites within the European cluster, with the CEU being the most closely related population set. Average IBS distances (Fig. 2D) between Mennonites and reference populations indicate that Mennonites are closely related to other Europeans from 1000G. In particular, the CEU is the population with closest genetic affinity with the Mennonites.
F ST were also computed between Mennonites and different datasets. In good agreement with the analysis of IBS, the data indicate that the Mennonites are closely related to the European datasets. The highest F ST were found to be with Africans and Asians ( Figure S1). A MDS plot based on F ST values reasserts this observation, while a dendogram indicates that the Mennonites are located in the European cluster, in a separate branch to the rest of the other European datasets used ( Figure S1).
Additional analysis carried out using other European population sets form Behar et al. 18 and Human Genome Diversity Project 19 (data not shown) also indicate that the CEU is the most closely related population to the Mennonites.
Admixture analysis of Mennonite genomic profiles. Admixture analysis was carried out using a selection of 1000G population sets that represent the main continental ancestries. With the selected populations, K = 4 yielded the analysis with the lowest cross validation value; this analysis classifies profiles into four main groups that coincide with Europeans, Native Americans (represented here by Peruvians), East Asians, and Sub-Saharan Africans (Fig. 3A). This analysis allocates most of the Mennonite genomic variation (99.7% on average) to European ancestry. The results of PCAdmix show virtually the same results (Fig. 3B,C), with ~97% of the ancestry allocated to the European cluster.

D-statistics.
The f 3 -statistic test determines whether the relationships between three populations can be explained with or without admixture, while the f 4 -statistic adds additional information about the direction of the gene flow. We carried out these analyses in order to explore if the Mennonite profiles could be explained by recent admixture between European variations and African and/or Native American ones. Both f 3 -statistics and f 4 -statistics showed no evidence of admixture (Fig. 4).

Discussion and Conclusions
The present study represents the first attempt aimed at exploring genome-wide SNP variation of Mennonites. Previous efforts were carried out on uniparental markers, using Y-STRs and mtDNA control region variation 3,9,10 . Uniparental markers have however some limitations for the interpretation of historical admixture when explored individually.
The historical routes followed by Mennonite communities once they arrived to America from European countries exposed them to population scenarios that favored conditions for admixture with locals of African and Native American ancestry. Our data indicate however that the variation observed in Mennonites is virtually of 100% European ancestry. The Native American ancestry identified by ADMIXTURE and PCAdmix, apart from being very low, is most likely due to the presence of European component in the Native American populations used as references (see barplot of PEL in Fig. 3). In fact, admixture and PCAdmix analysis using other reference Native American populations, e.g. Aymara and Quechua samples from Reich et al. 20 , corroborate a virtually 0% Native American component in the genome of Mennonites ( Figure S2). Therefore, this residual non-European ancestry cannot be considered to be real. The results obtained from the D-statistics (f 3 and f 4 -tests) corroborate the lack of admixture between Argentinean Mennonite profiles and African and/or Native American variation.
According to historical documentation, the ultimate homeland of Argentinean Mennonites was the Netherlands. Among the set of reference populations used for comparisons, the CEU (from 1000G) is the population set that best represents a Central European population. Not surprisingly, it is the CEU that appears in the results as the population most closely related to the Mennonites. This suggests that, despite the complex routes followed by Mennonites across America, from North to South, and the different populations they were in contact with, they have preserved their most ancestral European ancestry. This is also in good agreement with the increased consanguinity observed in the Argentinean congregation compared to other European groups; which is favored by low effective population sizes, limited gene flow, and endogamic marriages.
Overall, the results obtained in the present study fit well with those obtained in the analysis of Y-chromosome for male Argentinean Mennonites, particularly in two main aspects. First, ancestry in both the autosomes and the uniparental markers is virtually 100% European. Second, the Y-chromosome profiles analyzed in Toscanini et al. 9 already suggested the existence of endogamy in Mennonites.
The results of the present study cannot directly identify the possible admixture that could have occurred in America with non-Mennonites of European ancestry. However, the genomes of Argentinean Mennonites seem to show more affinities with Central Europeans than to e.g. Mediterranean (represented e.g. by Spanish from 1000G) or Atlantic Europeans (represented e.g. by Great Britain from 1000G). This would indirectly suggest limited inbreeding of Mennonites with non-Mennonite American communities. Finally, our study focused on an Argentinean Mennonite congregation, and therefore, extrapolating the present results to other American congregations should be taken with caution. It is important to note however that Argentinean Mennonites have Summarizing, the variation observed in Argentinean Mennonites is coherent with their known style of life and mating rules. These demographic features together set the ground for higher historical consanguinity and could also explain a higher incidence of certain Mendelian diseases in these small communities 15 .

Sampling.
A total of 27 saliva samples were recruited on Oragene DNA collection kits (DNAgenotek) from individuals belonging to the Mennonite congregation of 'La Nueva Esperanza' (La Pampa, Argentina). Non related males from this DNA collection were previously analyzed for Y-chromosome markers 9 .
We obtained written informed consent for all the donors prior the research, which includes consent for publication of individual data. Rights of participants were safeguarded during the research and their identity was protected. The study conforms with all applicable Spanish normative, namely the Biomedical Research Act  individuals had less than 0.3% of missing alleles. The quality of the genotyping was very high; thus, only 7, 43, and 602 variants showed 10%, 5%, and 1% of missing data, respectively. From all pairs of individuals related, the one with the highest missing data proportion was eliminated (see below). A total of five donors were finally excluded from the analysis because they showed cryptic relatedness.
Reference populations. Different SNPs genome repositories from diverse human populations were intersected with the SNP data obtained from the Mennonites; Supplementary Data Table S1. Data from 1000G provides the dataset with the largest SNP overlap with the Mennonites. The data from 1000G was retrieved from the original repository as done in Pardo-Seco et al. 22 ; these analyses involved 565,777 SNPs. In our figures and text we renamed the Iberian sample, IBD, as IBR in order to avoid confusion with the identity-by-state acronym. In order to explore for the existence of Native American component in the genome of Mennonites, we also used the masked dataset from Reich et al. 20 . This dataset, merged with 1000G data and the Mennonites data, yielded a total of 99,111 overlapped SNPs.

Statistical analysis.
A number of analyses were carried out using PLINK (ref. 23). First we computed IBD values from SNP data. We also tested Mennonite SNP data for potential close relationships between individuals. For this, we followed the procedure detailed in Gómez-Carballa et al. 24 . IBD values show a few close relationships among the Mennonites sampled. For this reason, we excluded one individual from each parent-offspring and sibling-sibling par, which correspond with IBD values around 0.5 (refs 25 and 26). MDS was carried out on a matrix of pairwise individual IBS values with the aim of exploring clusters of genetic variation in the population sets analyzed. MDS was performed using the function cmdscale (library stats) from R (http://www.r-project.org). Average IBS distances (used in Fig. 2C,D) were computed as the average of the Admixture in Mennonite genomes was explored using two different approaches. Firstly, we used ADMIXTURE software 27 , which uses a maximum likelihood estimation of individual ancestries from multi-locus SNP data. Secondly, we used PCAdmix 1.0 (ref. 28) to explore local ancestry assignment of ancestry-specific haplotypes across the genome. In order to create input files for PCAdmix, SNP unphased data were imputed and haplotypes were built using Beagle 3.3.2. (ref. 29). This analysis was carried out using reference populations from 1000G representing main continental ancestries: CEU (Europeans), YRI (sub-Saharan Africans), PEL (Native Americans) and CHB (East Asians). We assigned genomic segments to the three main ancestries following a posterior probability threshold of 0.8. Due to the fact that the Peruvian population set from 1000G has a proportion of European ancestry, as an alternative we also run Admixture and PCAdmix using Aymara and Quechua populations from Reich et al. 20 as Native American dataset. These results of all these admixture analysis were in agreement with each other.
The 3-population test or f 3 -statistics of the form f 3 (CEU,X AFR ;MEN) and f 3 (CEU,X NAM ;MEN) were computed in order to test for genetic admixture between different African (X AFR ) and Native American (X NAM ) population sets with the Mennonites (MEN). A non-negative mean would indicate that the profile of Mennonites cannot be explained by recent admixture with Africans or Native Americans. In order to determine the relationship existing between the Mennonite SNP profiles and the different population sets 20,30 we further computed a 4-population test or f 4 -statistics (ref. 31). This is a formal test for admixture that measures allele frequency correlations among populations; it provides statistical evidence of admixture and information on the directionality of the gene flow. The f 4 -statistic was computed using the weighted block jackknife procedure (block size of 5MB) (ref. 32). An outgroup is needed for the computation of f 4 -statistics. We used an outgroup that is symmetrically related to all modern human population groups, obtained by creating an individual profile possessing the ancestral alleles at all sites. This artificial outgroup ensures that there is no differential gene flow between this outgroup and the population sets used. More details on how this outgroup was built can be found in Pardo-Seco et al. 16 .