Introduction

In recent years, various national registries of volunteer bone marrow donors have been analysed increasingly often,1,2,3,4,5,6,7,8 stimulated by the growing number of registered donors, the diversity of their geographical origins and the accuracy of the haplotype definitions in HLA genetic systems. These analyses are also required because of the increasing number of transplants and the need to improve the recruitment of volunteer donors, particularly by identifying the areas where specific donors are most likely to be found and where rare haplotypes occur more often than expected by chance.

In 1995, Lonjou et al9 carried out the first study of the French National Registry of Volunteer Bone Marrow Donors. Their study estimated haplotype frequencies for the 20 administrative regions of France and thus enabled them to identify the regions where uncommon haplotypes have the highest probability of being found. The donors' geographic origins, however, were assigned very approximately, based as they were only on the area or town where the typing laboratory was located. The high mobility of French residents in recent decades means that this crude definition of origin can be highly erroneous. Most people were registered, at birth, in a town with a maternity ward or in a city where their parents moved to find work, and not necessarily where their relatives or other people likely to share haplotypes with them are settled. The best way to find the donors' true origin would be to determine the genealogical history of each donor, to maximise the chances of finding shared genetic structures, but the size of the registry (currently more than 100 000 volunteer donors) makes this option unfeasible.

As an alternative, we propose the use of donor surnames to increase the accuracy of the inference of donors' geographical origins. The basic idea is that (i) surnames are transmitted through the father's line, together with the genes of the Y chromosome, (ii) most (but not all) surnames have a fairly precise geographic origin, and (iii) the origins of male and female surnames are highly correlated, since the marriage circle in recent decades in France has not exceeded 5 km on average.10 In this study, we used the surnames of male donors to estimate haplotype frequencies for the 90 French departments (Figure 1 reports the department names and numbers), instead of the 20 regions (as in Lonjou et al9).

Figure 1
figure 1

Departmental distribution of the minimum number of HLA-A, B haplotypes needed to reach 50% of the total frequency (Corsica, dept. no. 20, is dropped down from its real geographical position). Department number: 1=Ain, 2=Aisne, 3=Allier, 4=Alpes Hte Provence, 5= Hautes Alpes, 6=Alpes Maritimes, 7=Ardèche, 8= Ardennes, 9=Ariège, 10=Aube, 11=Aude, 12=Aveyron, 13=Bouches du Rhône, 14=Calvados, 15= Cantal, 16=Charente, 17=Charente Maritime, 18=Cher, 19=Corrèze, 20 =Corsica, 21=Côte d'Or, 22=Côte d'Armor, 23=Creuse, 24=Dordogne, 25=Doubs, 26= Drôme, 27=Eure, 28=Eure et Loir, 29=Finistère, 30=Gard, 31=Haute Garonne, 32=Gers, 33=Gironde, 34=Hérault, 35=Ille et Vilaine, 36=Indre, 37=Indre et Loire, 38=Isère, 39=Jura, 40=Landes, 41=Loir et Cher, 42=Loire, 43=Haute Loire, 44=Loire atlantique, 45= Loiret, 46=Lot, 47=Lot et Garonne, 48=Lozère, 49= Maine et Loire, 50=Manche, 51=Marne, 52=Haute Marne, 53=Mayenne, 54=Meurthe et Moselle, 56= Morbihan, 57=Moselle, 58=Nièvre, 59=Nord, 60= Oise, 61=Orne, 62=Pas de Calais, 63=Puy du Dôme, 64=Pyrénées atlantiques, 65=Hautes Pyrénées, 66= Pyrénées orientales, 67=Bas Rhin, 68=Haut Rhin, 69=Rhône, 70=Haute Saône, 71=Saône et Loire, 72= Sarthe, 73=Savoie, 74=Haute Savoie, 75=Paris, 76= Seine maritime, 77=Seine et Marne, 78=Yvelines, 79=Deux Sèvres, 80=Somme, 81=Tarn, 82=Tarn et Garonne, 83=Var, 84=Vaucluse, 85=Vendée, 86=Vienne, 87=Haute Vienne, 88=Vosges, 89=Yonne, 90=Belfort.

Materials and methods

Donors, HLA and surnames

The French file of volunteer bone marrow donors contains 40 104 men, all serologically defined for HLA-A and HLA-B. Among them, 25 455 are also defined for both HLA-DRB1 and HLA-DQB1, and 22 211 only for HLA-DRB1.

We knew the name and location of the tissue-typing laboratory for each donor as well as his surname. Women were excluded from the analysis because it was impossible to be certain if they were registered under their father's or their husband's surname.

Information about French surname distribution is based on data obtained from the ‘Births Registry’ provided by Institut National des Statistiques et Etudes Economiques (INSEE), the French agency that collects, analyses, and disseminates economic and statistical information. This register contains, for each of the 36 000 communes in France and for every surname, the number of infants born during two separate periods (1891–1915, P1, and 1916–1940, P2) and alive in 1970. Data from towns and smaller municipalities in the same department are pooled together, and surname frequencies calculated for each of the 90 French departments.

Several investigations ultimately revealed that some donor surnames were absent from the INSEE registry for the first period (P1). These names represent immigrants into France since the first World War, from more than 40 different countries. To minimise bias in estimating haplotype frequencies of the French population, they are excluded from the present analysis: further detailed analyses will be devoted to them. Of the donor surnames listed in the INSEE register, a few have been found in only one department and thus have an unambiguous geographical origin. Even though most surnames are present in more than one department, they may nevertheless be informative if the geographic centre of their maximum frequency is well localised and if this frequency decreased rapidly from there.

Most French surnames, however, are widespread in France (eg, Martin, Bernard, Durand, Dupont) or are present in several departments without a clear centre of origin11 The present method (see surname method paragraph) allows us to overcome this difficulty, by providing us with the probability that a given surname comes from a given department, in view of its frequency in this department and in others.

The INSEE register included 17 971 donor surnames (names thus long established in France) that corresponded to a total of 30 777 donors. These donors corresponded to 8558 different HLA-A, B phenotypes and to 13 565 different HLA-A, B, DR,DQ phenotypes (Table 1).

Table 1 Distribution of the numbers of donors and surnames by typed phenotypes

Typing methods

Since laboratories differ substantially with respect to the level of definition of broad and split alleles, this study concerns only the antigens shown in Table 2. Donors with different antigens were excluded from the analysis, that is, donors typed as A9 (broad antigen) and not split in A23 or A24. The ‘blank’ allele (bl) was allowed for each locus.

Table 2 List of alleles studied

The HLA typing was performed according to the French Registry Quality Assurance Procedure, with HLA class I typing defined according to serological or molecular biology techniques. When the serological result was blank or uncertain, a DNA confirmatory typing was required. The HLA class I type was checked before the class II typing, which was performed with molecular biology techniques. The Transplant Centre performs confirmatory HLA class II typing when the donor is preselected for donation.

Surname methods

Surnames were used to specify the geographical origin of donors; origin was restricted, however, to one of the 90 French departments. Pjk, the probability that phenotype j came from department k, was estimated as follows:

with

with fik the frequency of the ith surname in the kth department. ωijk is a weight that takes into account the presence (δijk=1) or absence (δijk=0) of phenotype j in any person with surname i in department k and the a priori probability πjk of phenotype j in department k. We summed over all surnames and standardised by the sum of the weights over all phenotypes.

Once the phenotype frequency Pjk is obtained, it can be replaced into the formula as a new estimate of the a priori probability. The final estimation of Pjk can be obtained by iterative calculation until its value converges. The purpose of this work, however, was to look for rare haplotypes, so we preferred to postulate that each phenotype has an equal a priori probability in each department. Accordingly, we wanted, insofar as possible, to overestimate very slightly the frequencies of rare phenotypes compared with those of common phenotypes.

By applying the formula, we could attribute to each of the 8558 HLA-A,B phenotypes and each of the 13 565 HLA-A,B,DR,DQ phenotypes (and thus to each donor) a vector of 90 probabilities of origin, one probability per department.

Statistical methods

We estimated the haplotype frequencies in each department, from the phenotypic frequencies provided by the method described above. These estimations were made with an expectation-maximisation algorithm, an iterative procedure that yields maximum-likelihood estimates of haplotype frequencies from multilocus genotype data with unknown gametic phase. This procedure is implemented in the ‘Arlequin’ software programme.12 Hardy–Weinberg equilibrium of the HLA haplotypes cannot be calculated because of the presence of the recessive ‘blank’ allele.

The value for the minimum number of haplotypes (mnh) needed to reach the threshold of 0.50 frequency serves as a summary indicator of HLA geographical diversity in France.

Results

Distribution of HLA-A,B haplotypes

Most HLA-A,B haplotypes had a frequency lower than 0. 001 in all French departments: only 514 of 8558 haplotypes were present at a frequency estimated to be higher than 0.001. Table 3 lists the 63 HLA-A,B haplotypes found in all 90 departments. Since they represent the most widely shared genetic background in France, they are the least difficult to locate and obtain for transplantation.

Table 3 List of HLA-A, B haplotypes present in all 90 French departments

Only 15 haplotypes were found in only one department. Most of these were present at a very low frequency (Table 4), except HLA-A3,B78 (0.003) in Landes (dept. no. 40) and HLA-A66,B50 (0.001) in Aube (dept. no. 10).

Table 4 List of HLA-A, B haplotypes found in only a single department (Dept. no.) and their frequency

Table 5 summarises the most frequent haplotypes according to department. HLA-A1,B8 was the most common in three departments (dept. nos. 1, 59, 65) with its highest frequency (0.062) in Hautes-Pyrénées (dept. no. 65). HLA-A29,B44 was the most common in 50 departments, and its frequency was highest (0.078) in Landes (dept. no. 40). The frequency of the HLA-A2,B44 haplotype was highest (0.079) in Morbihan (dept. no. 56). The trend in Corsica (dept. no. 20) was different: HLA-A2,B62 was the most frequent haplotype (0.041) and HLA-A29,B44 the least frequent (0.005). Landes (dept. no. 40) and Tarn-et-Garonne (dept. no. 82) had their most frequent haplotype (HLA-A29,B12) in common, at a frequency of 0.062 and 0.052, respectively.

Table 5 List of the most frequent HLA-A, B haplotypes in France, estimated as the mean of the frequency in each of the 90 departments. The minimum and maximum frequencies are given with the department where they occur (Depmin and Depmax, respectively)

The mnh needed to reach 50% of the total frequency was approximately 25 and correlated with the homozygosity index (ie, the probability of finding donors with a homozygous genotype) but not with the total number of haplotypes in the department (data not shown).

The lowest mnh value (18) was observed in Hautes-Pyrénées (dept. no. 65) and the highest (31) in Meurthe-et-Moselle (dept. no. 54). The department of Paris (no. 75) also had a high mnh value (29), with all haplotypes present at a very low frequency, and HLA-A29,B44 the most frequent (0.05) (Figure 1).

Distribution of HLA-A,B,DR,DQ haplotypes

In all, 5029 different HLA-A,B,DR,DQ haplotypes occurred at a frequency greater than 0.0001. Only 15 were found in all 90 departments at a frequency greater than 0.001 (Table 6). The most frequent haplotype was HLA-A1,B8,DR3,DQ2 (ranging from 0.077 in Morbihan to 0.012 in Vaucluse), followed by HLA-A29,B44,DR7,DQ2 and HLA-A3,B7,DR2,DQ1.

Table 6 List of the HLA-A, B,DR,DQ haplotypes present in all 90 French departments

Substantial geographical diversity is observed in this haplotype distribution. The mnh necessary to reach a frequency of 0.5 ranged from 54 to 60 in 26 departments (Figure 2) and exceeded 60 in the others, except for Corsica (mnh only 20 haplotypes). Departments with a low mnh value were located in the Alps (dept no. 4), the Pyrénées (dept no. 9 and 65), and the Massif Central (dept. no. 29) and in Brittany (dept no. 56). Conversely, Paris, as a department, had the highest mnh value (ie, 97), each of haplotypes at a low frequency.

Figure 2
figure 2

Departmental distribution of minimum number of HLA-A, B,DR,DQ haplotypes needed to reach 50% of the total frequency.

Table 7 reports the 14 haplotypes found at a frequency greater than 0.005 in only one department. Only two were observed at a frequency higher than 0.01.

Table 7 List of the HLA-A,B,DR,DQ haplotypes found in an unique department (Dept. no.) with a frequency greater than 0.005

Table 8 summarises the most frequent haplotypes in all 90 departments. HLA-A1,B8,DR3,DQ2 was found at a frequency of 0.077 in seven departments in the north of France, HLA-A29,B44,DR7,DQ2 was the most common haplotype in 53 departments (particularly in the south), and HLA-A3,B7,DR2,DQ1 in 28 departments (throughout the country). HLA-A2,B51,DR11,DQ3 was the most frequent (0.065) haplotype in Corsica and HLA-A2,B7,DR2,DQ1 (0.068) in Hautes-Pyrenees.

Table 8 List of the most frequent HLA-A, B,DR,DQ haplotypes in France, estimated as the mean of the frequency in each of the 90 departments frequencies. The minimum and maximum frequencies are given with the department where they occur (Depmin and Depmax, respectively)

Discussion

Surnames have proved to be useful for describing the population structure, and isonymy makes it possible to evaluate isolation and consanguinity (see the recent bibliography in Barrai et al,13 Mathias et al,14 Rodriguez-Larralde,15 Lasker et al,16 Sykes et al,17 Manni et al,18 Barrai et al,19,20 Caravello et al,21 Gagnon et al,22). They are also frequently used as a tool to identify individuals' geographical origin (see the recent bibliography in Chan,23 Garza-Chapa et al,24 De Silvesti and Guglielmino,25 Pollock et al,26 Jacobs and Landerdale,27 Rudan,28 Carta et al,29 Brancatelli et al30). Surname distribution in France has been accurately known since at least the end of the 19 century and has been studied extensively.31,32,33,34,35,36,37,38,39,40,41,42

Under these circumstances, the use of surnames might provide additional specificity about the geographical origin of bone marrow donors, and, consequently, increase our efficiency in locating wanted donors. First, we can easily recognise donors who have migrated to France relatively recently: their surnames are not registered in any department before 1916. This point indicates the essential arbitrariness of our definition of the ‘French population’ as people sharing names with people born and registered anywhere in France before 1916. France has traditionally welcomed emigrants from neighbouring countries (Italy, Spain, Belgium, etc), and, more recently, from Poland, Armenia, North Africa and Asia. ‘The French’ cannot therefore be realistically considered as a homogeneous genetic unit. Various studies9,32,43 have already emphasised the large genetic variability in France, which results from isolation and genetic drift in some areas and from large melting pots in others. Even these studies, however, are quite imprecise in their description of the French population structure. As Cavalli-Sforza et al44 said ‘Unfortunately, for a vast territory like France, 14 points do not permit a high resolution …’. The method described here, using surname data, allows us to circumvent this difficulty and provide an accurate geographical picture of the French genetic structure. Our principal purpose must nevertheless be borne in mind: to obtain an accurate geographical distribution of HLA haplotypes to optimise policies for the recruitment of bone marrow donors.

HLA-A,B haplotypes

We have not focused on the most frequent HLA-A,B haplotypes, found all over France (Table 3). These are usually not specific to France and are found, for example, distributed at high frequencies in other western European countries.45 Table 4 provides more useful results inasmuch as these ‘unique’ and rare haplotypes are more difficult to locate when needed. They are either typical haplotypes attached to a given department or foreign haplotypes from recent immigration. Not surprisingly, the probability of findings these kinds of haplotypes is high in Paris (dept no. 75) and its suburbs (dept no. 77 and 78), as a consequence of the convergent migration towards Paris from all departments, particularly since the 19th century. Indeed, almost all haplotypes (378 of the 514 with a frequency larger than 0.01) can be found in department 75, albeit at a very low frequency: the mnh (that is, the number needed to reach a frequency of 0.50) is 29. The most common haplotype in Paris is HLA-A29,B44 (0.05), which is also found throughout western Europe, in Finland, and even among the Zulus of sub-Saharan Africa.45 From a transplantation point of view, this means that theoretically all HLA-A,B haplotypes can be found in Paris, but at so low a frequency that without high effective recruitment policies (for example, completion of a questionnaire about geographical origin even before typing analysis), it is nearly impossible to find a specific rare haplotype.

Most of the departments in France have different and original patterns (Figure 1). Departments that are made up of scattered villages due to hilly landscapes or hedged fields (dept. nos. 4, 9, 12, 29, 56, 65, 81) have the fewest haplotypes and lowest mnh. The rarity of immigrants contributed to the high degree of consanguinity.31 Conversely, the departments that welcomed many immigrants have the most variant haplotypes (Lorraine, dept. nos. 8, 54, 57; Ile-de-France, dept. nos. 75, 77, 10, 89; Provence, dept. nos. 30, 13). One particular example is Corsica, which has a long history of isolation within the Mediterranean basin, despite a close relation with Italy. It was annexed to France in 1768. Its mnh is only 20 haplotypes. The two most frequent haplotypes there are also found in western Europe, in particular in Spain and Italy (HLA-A2,B62), and in Spain, Portugal and Greece (HLA-A24,B35).45 Many haplotypes are also shared with the nearest mainland departments (depts. nos. 6, 83 and 13), where recent emigrants from Corsica first settled.

HLA-A,B,DR,DQ haplotypes

The frequencies of the HLA-A,B,DR,DQ haplotypes were estimated from fewer people (17 745 donors) than those frequencies for the HLA-A,B haplotypes (30 777 donors) because until the end of 1998, donors were typed for DR and DQ only when their potential HLA-A, B was compatible with a recipient. This could introduce a bias into the estimation of haplotype frequencies. Thereafter donors were simultaneously typed for HLA-A, B, DR, DQ, but unfortunately this sample is still much too small (2216) for a detailed description of all of France.

As reported above, Corsica is different from the rest of the country, with fewer HLA-A, B, DR, DQ haplotypes at higher frequencies. Figure 2 shows that again only 20 of the most frequent haplotypes was sufficient to reach the value of 0.5: Corsica has less variability than the other departments. Table 7 indicates that of the haplotypes found in only a single department, two were located in Corsica at a frequency greater than 0.01: HLA-A26,B35,DR4,DQ2 and HLA- A2,B41,DR3,DQ3. The latter haplotype contains HLA-B41, also found in Lebanon, Ukraine and Armenia,45 and HLA-A2,B41, which has a high frequency in Greece.45 The presence of these haplotypes in Corsica suggests the possibility of genetic exchange or migration between these regions and Corsica.

The frequency of the HLA-A28,B35,DR8,DQ4 haplotype shows the greatest contrast. It seems to have originated in the Pyrenees; its frequency decreases rapidly as it spreads towards neighbouring departments, and it is absent from most other departments (55 of them). This distribution may reflect the trace of the ‘Basque’ population living in France and Spain.

Although the frequencies of the HLA-A,B,DR,DQ haplotypes are slightly biased, as explained above, their departmental distributions are consistent with those of the HLA-A,B frequencies,

This detailed description of the HLA diversity in France uses a method that takes advantage of the known surname distribution to refine the results described by Lonjou et al9 and to increase the accuracy of the geographical origin of haplotypes. Doing so enhances the likelihood of localising and obtaining, at the lowest cost, the best donors for patients awaiting bone donor transplantation from unrelated donors, particularly when their haplotypes are rare: first by identifying the geographical origin of the intended recipient, and then, by using the method described here to locate the area where the probability of finding the volunteer donors sharing these same haplotypes is highest.