The study of surnames, also called ‘the poor man's population genetics’,1 has a long tradition in genealogical research and was extensively used long before the term ‘genetics’ was first coined.2 Indeed, the fact that surnames are patrilineally inherited in many parts of the world,3, 4, 5 including Central Europe,2 implies that names should be of considerable interest to geneticists too. Surnames in combination with genetic studies have proved useful for describing population structures, for example, in France, Sicily and Netherlands.6, 7, 8

However, Y-chromosomal DNA polymorphisms are ideally suited for studies of male demography themselves, and the availability of rapidly evolving markers on the Y chromosome has lately rendered the onomastics of human surnames an outsider discipline. Thus, the ability of hypervariable Y-chromosomal short tandem repeats (Y-STRs) to discriminate even between closely related and co-localized male populations has been demonstrated for Germans and Dutch9 for the Baltic populations,10 for Central England and North Wales,11 and for Poland and Germany.12 At an even larger scale, the recent identification of previously unrecognized population strata in the Y-STR haplotype distribution of more than 12 000 males from 91 European localities13 has once more highlighted the usefulness of this approach. Nevertheless, only a small number of studies have so far addressed the actual relationship between the distribution of surnames and Y-chromosomal haplotypes.14, 15, 16, 17

Surnames vary substantially both between and within European countries. In Germany, for example, although the majority of the one million different surnames are typically German (eg ‘Müller’, ‘Schmidt’ or ‘Berger’), names with foreign roots are also abundant. The majority of the latter are of Slavic origin18 (approximately 20%) and many of them are easy to recognize by consonant combinations that are otherwise unfamiliar to the German language. Examples include the names of the German writer Kurt Tucholsky and of the second author of this article. In many cases, however, the foreign origin of a surname may not be immediately apparent, as is the case, for example, for the name of the 18th Century play writer Gotthold Ephraim Lessing.

The patrilineal inheritance of both surnames and Y chromosomes suggests that different strata of surnames should correspond to different strata of Y chromosomes. Since this relationship is likely to have become obscured not only by mutation but also by illegitimate births and the change of surnames, quantifying the residual correlation between the two characteristics would be of both theoretical and practical relevance. On the one hand, information about the history of patrilines is useful for the precise estimation of mutation rates and for the assessment of migration behaviours. On the other hand, surnames potentially provide a simple means of stratifying populations prior to Y-chromosomal analyses that target prehistoric events, thereby increasing their efficiency through a reduction in genotyping load. The aim of the present study was thus to assess the extent to which Y-STR haplotypes of German males, born and living in the region of Halle (Saale), are indicative of a German, Slavic or mixed German-Slavic descent of their surnames.

Materials and methods

DNA samples

DNA samples were obtained from 419 German males, born around Halle (Saale), located in the South-East of Germany (Figure 1), who identified themselves as Germans. An additional group of 29 German males were sampled from the Sorbish minority, a Slavic-speaking community living in the Lausitz area near the Polish border.19

Figure 1
figure 1

Map of Germany showing the city of Halle (Saale).

Y-STR analysis

Eight Y-STR loci were analysed, namely DYS19, DYS385, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393. Locus information and PCR primer sequences can be found in Kayser et al20 or at the Y-STR Haplotype Reference Database (YHRD) web site ( The YHRD nomenclature was used here in accordance with recommendations by the International Society of Forensic Genetics,18 designating Y-STR alleles by the number of repeats included. DNA was amplified in two multiplex reactions, following Elmoznino and Prinz.21 Consistent allele designation and genotyping quality were assured by the concurrent electrophoretic analysis of sequenced allelic ladders or sequenced reference DNA samples. PCR products were analysed by capillary electrophoresis using an ABI 310 Genetic Analyzer (Applied Biosystems, Weiterstadt, Germany) and the Genotyper software.22 DNA of the Sorbish males was analysed using the Mentype® Argus Y-MH PCR Amplification Kit (Biotype, Dresden).

Analysis of surnames

The Halle samples were divided into three subgroups, according to surname. Two larger groups comprised 195 males with surnames that were definitely German (‘G’) and 185 males with definitely Slavic surnames (‘S’). The third group contained 39 males with mixed German-Slavic surnames (‘M’). Samples of 29 Sorbs19 and some 1313 published haplotypes from Polish males13 were used for comparison. Surname groups were defined on the basis of spelling, using certain combinations of consonants and surname suffixes to categorize the origin of the name in question. Suffixes ‘-er’, ‘-mann’ and ‘-burg’, for example, are typically German whereas ‘-ke’, ‘-ka’, ‘-ow’ and ‘-ski’ are typically Slavic. In addition, the root morphemes of surnames were also examined. Examples for a Slavic root comprise ‘Lessing’, which sounds German but was derived from the Slavic expression for ‘forest settler’, and ‘Kafka’, which in Czech means ‘jackdaw’. Mixed surnames include both German and Slavic elements, that is, a German basis and a Slavic ending, or vice versa (‘Wudtke’ or ‘Kuppke’). These surnames are the result of a long parallel usage of both German and Slavic languages in the eastern part of Germany.

Statistical analysis

The genetic relationship between the German, Sorbish and Polish samples was assessed by Analysis of Molecular Variance (AMOVA) using ΦST, an analogue of Wright's FST that takes the evolutionary distance between individual Y-STR haplotypes into account.23, 24 The analysis was confined to the so-called ‘core’ haplotype, comprising all markers but DYS385. Marker DYS385 had to be excluded since its multilocal nature hampers the unambiguous assignment of evolutionary distances to allele pairs. Populations were recursively clustered by combining, in each step, that pair of samples or clusters that yielded the minimum global ΦST value for the core haplotype. Clustering was carried out until only one cluster remained. Estimates of pairwise and global ΦST values were obtained using the ARLEQUIN software25 with a single step mutation model, and tested for statistical significance by means of random permutation of samples in 10 000 replicates.13


In the 419 East-German males analysed in the present study, a total of 270 different Y-STR haplotypes were observed. While the most frequent haplotype occurred 10 times, 146 haplotypes were unique (data available from the authors upon request). Group G comprised 139 different surnames, 18 of which occurred twice. Five surnames were observed more than two times (3 × 3, 1 × 5, 1 × 6). There was only one instance in group G of a surname being shared by two males with the same haplotype. In group S, 177 different surnames occurred, four of which were found twice. No two males with the same surname had the same haplotype. Finally, no shared surnames and haplotypes were observed in group M.

Upon AMOVA, the core Y-STR haplotype distributions of males with German (‘G’) and mixed surnames (‘M’) were found to be indistinguishable (ΦST=−0.0008, P>0.5). The two samples were therefore combined into one group (‘G+M’). Please note that this joint consideration of G and M was retrospectively justified in that an analysis of group G alone yielded virtually identical results (not shown). A highly significant difference emerged between the combined G+M group and the group of males with a Slavic surname (‘S’; see Table 1). The observed level of differentiation (pairwise ΦST=0.0277, P<0.001) between groups G+M and S was surprisingly large and so were approximates seen between European populations of much larger geographical and linguistic distances (eg Cologne and Budapest; see Cluster analysis based upon global ΦST (Figure 2) revealed that the Y-STR core haplotype distribution of the German S group is substantially closer to that of the Polish population than to that of the G+M group. The Sorbish males appear to be similarly close to both the S group and the Polish group, although their positioning in the tree may be less robust owing to small sample size.

Table 1 Pairwise Y-STR-based ΦST for central European males
Figure 2
figure 2

Clustering of Central European male samples by global Y-STR-based ΦST.

In a recent study of European Y-STR haplotypes, several population clusters were identified; among them were clearly defined ‘Eastern European’ and ‘Western European’ groupings.13 Haplotypes from these fringe clusters, as well as their one-step neighbours, were classified as either ‘Western’ or ‘Eastern’, depending upon where they were more frequent. A similar characterization of the present samples in terms of the relative proportion of the fringe haplotypes resulted in highly significant differences between the two surname-defined German subgroups, G+M and S (χ2=13.094, 2 df, P=0.001). While 88 of the 234 haplotypes (38%) in the combined G+M group were classified as ‘Western’, this was the case for only 42 of the 185 haplotypes (23%) in group S. In contrast, 80 G+M haplotypes (34%) were of ‘Eastern’ type compared to 91 S haplotypes (49%). The portion of unclassifiable haplotypes was 28% in both groups (66 in G+M, 52 in S).

The seeming characterization of surname-defined male samples from Halle as either ‘Western’ or ‘Eastern’ was further corroborated by comparing the frequency of all haplotypes observed in groups G+M and S with the current release of YHRD (Release 15), comprising 17 214 haplotypes from 125 samples of European or Near-Eastern extraction (Figure 3). Males from group G+M shared the majority of their Y-STR haplotypes with western populations whereas the distribution in group S was closer to that of eastern, most notably Polish, populations. The proportion of haplotypes shared between group S and Polish males was higher than that with any other German sample.

Figure 3
figure 3

Matches between the Y-STR Haplotype Reference Database and core Y-STR haplotypes of males with German or Mixed (top) and Slavic (bottom) surnames.


How can the profound stratification observed among East-German male lineages and their correlation with surnames be best explained? Although the name ‘Germany’ appears to imply a homogenous origin of the German people, the country has always been a gateway for migration, mostly from east to west. The best documented wave of migration was that of Eastern Germanic tribes and Slavs, driven by the Huns, that led to the downfall of the Roman Empire. In historic times, two major instances of assimilation of Slavic people into the German nation occurred. Around 950 AD, the German Empire started to put pressure upon the Slavic peoples inhabiting large areas of what was to become, in the mid of the 20th Century, the German Democratic Republic.26 By 1100 AD, after more than 100 years of wars and proselytization, the complete area of contemporary Germany had come under the influence of the German Empire. During the following centuries, most of the non-Germanic tribes (like the Baltic Prussians) completely abandoned their language, and their descendants are today regarded as ‘typically German’. Only in a small area, southeast of Berlin, known as the Lausitz, the Slavic-speaking Sorb people maintained their language and culture, and their descendants today represent the only recognized, non-immigrant minority in East Germany. In any case, the names of many cities, including Berlin (meaning ‘little swamp’), and some surnames, most notably those of ‘typically Prussian’ nature like ‘von Clausewitz’ or ‘Virchow’, still reflect the Slavic roots of this part of Germany. The second major assimilation of people with Slavic ancestry occurred during the Industrial Revolution in the 19th Century. Thousands of people from Eastern Europe migrated to the West to work in the surging industrial areas of Germany (Silesia, Ruhr-Area). Although they brought their surnames with them, they nevertheless became culturally amalgamated quite rapidly by the German majority.

The Halle region is located exactly at the intersection of the Germanic and Slavic spheres of influence of the 10th century, but it is also a traditional mining and chemical industry area (Halle-Leipzig-Bitterfeld) that has attracted Slavic workers during the Industrial Revolution. Both of these factors should have had an impact upon the male-specific genetic structure of the local population where surnames of Germanic and Slavic origin are about equally frequent. In terms of the relative importance of the two historic instances for the observed correlation between Y-STR haplotypes and surname characteristics, it is interesting to note that surnames first occurred in Europe in Venice during the 9th Century. From there, the law of name bearing was adopted in France and Catalonia in the 11th, and in England, and Western and Southern Germany in the 12th Century. In the North and East of Germany, the custom was practised no earlier than the 15th Century and, in some rural regions, surnames became fashionable only in the 18th century, nearly 900 years after their first appearance in Europe.27, 28 Furthermore, surnames frequently changed or became modified until the beginning of the 19th century. Therefore, it appears unlikely that the correlation between surnames and Y-STR haplotypes observed in our study dates back to the Middle Ages, but is more likely to be the result of the immigration of industrial workers in the 19th Century instead. In this respect, Central Europe appears to differ from England and Ireland where patrilineally inherited names are presumed to have a much deeper rooting.14, 15, 16, 17

Our results highlight the fact that the Y-chromosomal genetic structure of modern Central European populations is heterogeneous and that, particularly in East Germany, the concomitant strata may be resolvable by the consideration of surnames. This implies that future studies targeted at more ancient population movements inside or outside the region through the use of slowly evolving Y-chromosomal markers (ie SNPs) may gain efficiency from allotting the genotyping load according to surnames.