Introduction

The current outbreak of kala-azar or clinical visceral leishmaniasis (VL) caused by Leishmania donovani sensu strictu in eastern and southern Sudan has taken its toll in an impoverished, war-stricken population, killing tens of thousands and depopulating vast areas of southern Sudan. The calamity brought kala-azar to the forefront as one of the greatest epidemics of the 20th century.1,2 Longitudinal studies in Sudan show marked differences in incidence of clinical disease between adjacent villages inhabited by different ethnic groups.3,4 Furthermore, when members of the different ethnic groups share the same immediate environment and exposure, certain ethnic groups remain at a higher risk of developing VL.5 These observations support the hypothesis that the host genotype plays an important role in disease susceptibility.

VL caused by L. donovani sensu strictu in the mouse provides a classical model of major single gene control of innate resistance to infection.6,7,8,9,10 The gene, designated Lsh, Ity or Bcg, was also shown to influence innate resistance to Salmonella typhimurium,11,12,13 Mycobacterium bovis BCG,14 M. lepraemurium15,16 and M. intracellulare.17 Following its identification by positional cloning,18 the gene was renamed the natural resistance-associated macrophage protein (Nramp1). This is now superseded by the functional designation solute carrier family 11a (proton-coupled divalent metal ion transporters) member 1 or Slc11a1, consistent with our formal demonstration that the proteins encoded by Slc11a1/SLC11A1 function as proton/divalent cation (Fe2+, Zn2+ and Mn2+) antiporters.19 Mice carrying the natural gly-to-asp mutation at position 169 in putative membrane-spanning domain number 4 are as susceptible to infection with all the three groups of pathogen as gene disrupted (‘knockout’) mice.20 The protein localises to the late endosomal/lysosomal compartment of macrophages,21,22 but not to early endosomes.21 In man, SLC11A1 has been linked to genetic susceptibility to leprosy in Vietnam23 and to tuberculosis in Brazil24 and in aboriginal Canadians.25 Indeed, SLC11A1 appears to be globally associated with tuberculosis,24,25,26,27,28,29 with evidence that both 5′ and 3′ polymorphisms contribute independently to susceptibility.26 Linkage or allelic association has also been demonstrated for HIV30 and for a wide range of autoimmune diseases in man.31,32,33,34,35,36,37 Here, we present data demonstrating a role for SLC11A1 in determining susceptibility to VL in the highly susceptible Masalit tribe in Sudan.

Subjects and methods

Ascertainment of families

The study was carried out on the Nilosaharan-speaking Masalit population who migrated from western Sudan in the early–mid 1980s to occupy villages along the Rahad River in the heart of the endemic area in eastern Sudan. Epidemiological and demographic details relating to the study site are described in detail elsewhere.3,4,38,39 Multicase families with VL were ascertained from epidemiological and medical records of the Institute of Endemic Diseases. Diagnosis was made on the basis of clinical, parasitological and serological criteria, as described.3,39 Ethical approval for this study was obtained from the Ethical Committee of the University of Khartoum. Buccal swab DNA was prepared from 312 individuals from 59 extended multicase families (67 nuclear families) with one to six affected offspring per nuclear family (Table 1). Nuclear families with one case were always part of an extended multicase pedigree.

Table 1 Family structures for the 59 Masalit families collected from El Rugab and Um Salala villages in eastern Sudan

Genotyping

Three SNPs within SLC11A1 (274C/T, 469+14G/C and D543N) were genotyped using the PCR primers, restriction enzymes and conditions described.40 The microsatellite D2S147141 and the (GT)n repeat in the promoter region of SLC11A142 were PCR amplified using NED- and FAM-labelled forward primers, and PCR products analysed by electrophoresis on 6% polyacrylamide gels using an automated sequencer (model ABI377, Applied Biosystems) as previously described.37 Two insertion/deletion (IN/DEL) polymorphisms in the 3′UTR of SLC11A140,43 were also typed using the automated sequencer. The 3′UTR TGTG polymorphism40 was amplified using FAM-labelled (5′ TAC CTG CAG TAG GGC CA 3′) and unlabelled (5′ AAA CAG CAG GTC CCT AAA GC) primers. This yielded allele sizes of 393 bp (allele 1) and 397 bp (allele 2). The 3′UTR CAAA polymorphism43 was amplified using HEX-labelled (5′CTC CAG TTT GGA GCC TGT GT 3′) and unlabelled (5′ CTA GCG CAG CCA TGT GAT TA 3′) primers. This yielded allele sizes of 245 bp (allele 1) and 250 bp (allele 2). Microsatellites and IN/DELs were analysed using the computer software genescan and genotyper (Applied Biosystems). Allele frequencies for the polymorphisms used in this study are shown in Table 2.

Table 2 Allele frequencies for SLC11A1 markers and D2S1471

Linkage analysis

Nonparametric linkage analysis was performed in ALLEGRO44 using the Spairs scoring function with 0.5 weighting to take account of differences in family size. ALLEGRO reports allele-sharing LOD scores and maximum Z scores for the likelihood ratio (Zlr). One-sided P-values associated with LOD and Zlr scores are used throughout. For multipoint analysis, genetic distances between markers were entered in cM calculated on the basis of the physical map distance (1 Mb=1 cM). The fraction of the total inheritance information extracted by the available marker data is indicated by the ‘information content’. Simulations performed in ALLEGRO using SLC11A1 marker information from this study demonstrated that the 59 families had 100% power to detect linkage up to a critical value equivalent to an allele-sharing LOD score >3.95 (P<0.00001) for a gene with penetrances 7–61%,45 controlling the underlying susceptibility to VL.

Allelic association testing

Family-based allelic association tests were performed using the TDT46 correcting for clustering at the nuclear family level and nonindependence between sibs using a robust sandwich estimator for the variance and the Wald χ2-test. Single-point and haplotype TDT were also performed using the ETDT47 implemented within TDTPHASE.48 Allelic and genotype associations, and relative risk estimates, were obtained by creating a ‘case/pseudo-control’ study, where the ‘cases’ comprise the genotypes of the affected offspring, and the ‘controls’ are the one to three other genotypes (depending on whether phase is known or inferred) which the affected offspring might have received from the parents.49 The relative risks were estimated using conditional logistic regression analysis, again employing robust variance estimates to control for family clustering and a score test to indicate the overall significance for allelic or genotype associations. A stepwise logistic-regression procedure49 was used to evaluate the relative importance of variants at the different sites within SLC11A1. Score tests were used to compare models in which the main effects for both loci are modelled with one in which the main effects at the primary locus only are included. Robust TDT and case/pseudo-control statistical tests implemented within Stata were developed by Heather Cordell and David Clayton at the Cambridge Institute for Medical Research, and are available at http://www-gene.cimr.cam.ac.uk/clayton/software/.

Linkage disequilibrium

Linkage disequilibrium between pairs of markers across SLC11A1 was determined using Hedrick's definition of Lewontin's D′ statistic.50

Sequence analysis

Direct cycle sequence analysis was performed on DNA samples isolated from 29 cases (25 for exon 4a) from 27 families, and seven unaffected (two children, five adults), used in the linkage and association studies. The affected individuals were selected from a subset of families that showed the highest LOD scores for linkage to SLC11A1. Separate PCR assays were designed to amplify the 15 exons, intron 1 and all the intron/exon boundaries of SLC11A1, the region of intron 4 (nt 3046–3503) around and including alternatively spliced exon 4a51 (nt 3191–3264), 600 bp of sequence upstream of the transcription start site and 819 bp of sequence downstream of the TAG stop codon. The TOPO TA Cloning® kit (Invitrogen, living science) was used to clone PCR-amplified products to facilitate sequence analysis of the 58 individual chromosomes, especially through the promoter region repeats. However, it was not possible to determine whether each new SNP was in cis or trans to SNPs identified on other PCR products in the same individual.

Results

Linkage analysis

A breakdown of family structures for the 59 multicase families used in this study is given in Table 1. Allele frequencies for the SLC11A1 and D2S1471 loci derived from genetically independent individuals in the families are shown in Table 2. Multipoint nonparametric linkage analysis in ALLEGRO (Table 3) provides evidence ( Z lr pairs scores 2.55–2.38; 0.008P0.012; information content 0.88) for linkage between VL and markers across SLC11A1, dropping away at D2S1471 ( Z lr pairs score 1.99; P=0.028; information content 0.88) lying 68 kb distal (telomeric) to the 3′ end of SLC11A1.

Table 3 Multipoint nonparametric linkage analysis using the Spairs function in ALLEGRO44 to examine linkage beween the VL phenotype and markers across the SLC11A1 and adjacent D2S1471 region

Family-based allelic association

Extended transmission disequilibrium testing (ETDT)47 for single markers implemented within TDTPHASE48 showed significant global associations (Table 4a) for the 5′ promoter (GTn) and intragenic single-nucleotide polymorphisms (SNPs) at 274C/T and 469+14G/C, but not for the 3′ exon 15 SNP D543N, the 3′UTR insertion/deletion (IN/DELs) polymorphisms, or for the D2S1471 microsatellite lying 68 kb distal to SLC11A1. Single-point TDT P-values at 274C/T and 469+14G/C retained significance after correcting for family clustering using TDT with a robust sandwich estimator for the variance (Wald χ2 with 1 df=5.59, P=0.0171 for 274C/T; Wald χ2 with 1 df=5.56, P=0.0184 for 469+14G/C), that is, markers within SLC11A1 show true allelic association with disease in the presence of linkage. Significant associations for 5′ markers at SLC11A1 were supported by the case/pseudo-control logistic regression analysis (Table 5a) that showed significant allelic associations at the GTn repeat, as well as for the 274C/T and 469+14G/C biallelic markers. A borderline significant χ2 test comparing allele-wise (1 df) and genotype-wise (2 df) tests for GTn allele 3 indicates dominance rather than a simple multiplicative (ie both alleles contributing equally) model. The genotype-relative risks for carrying one or two copies of allele 3 at the GTn were 9.63 (95% CI 1.07–86.96; P=0.044) and 10.20 (95% CI 1.06–98.16; P=0.044), respectively. The allele-wise relative risk associated with allele 4 at the GTn was 6.00 (95% CI 1.50–23.99; P=0.011), with allele 2 at 274C/T was 2.00 (95% CI 1.16–3.44; P=0.012), and with allele 2 at 469+14G/C was 0.38 (95% CI 0.20–0.71; P=0.003). No dominance effects were observed for 274C/T and 469+14G/C. TDTPHASE (Table 4a) showed significant (0.011P0.0089) global associations for 5′ (ie GTn-274C/T-469+14G/C) intragenic haplotypes and for haplotypes extending from 5′ to 3′ markers within SLC11A1 (0.006P0.033), but not for haplotypes involving only the 3′ markers (ie D543N-TGTG-CAAA). The 5′ haplotype associations involved a significant bias in the transmission of haplotype 3-2-1 (P=0.003) to the affected offspring, and significant protection (P=0.005) associated with haplotype 2-1-2 (Table 4b). Bias (borderline significant) in transmission of the haplotype 4-1-2 to the affected offspring occurred in a single extended pedigree.

Table 4 (a) Global allelewise significance levels (P-values) for family-based SLC11A1 allelic and haplotype associations with VL obtained using ETDT47 implemented within TDTPHASE.48 (b) Specific haplotype associations for the three 5′ markers (GTn−274C/T−469+14C/G) showing significant individual marker and haplotype global associations between SLC11A1 and VL in Sudan
Table 5 ‘Case/pseudo-control’ allelic association testing49 for SLC11A1 markers and VL. (a) Shows P-values for conditional logistic regression analysis for allelic (1 df tests) and genotype (2 df tests) associations at the six polymorphic markers. A significant χ2 test comparing the difference between 1 and 2 df tests indicates dominance rather than a multiplicative model. (b) Shows the results of score tests of the main effects of a locus in a forward stepwise regression procedure. Robust variance estimates to control for family clustering were used throughout

Of the markers associated with VL in the 5′ region of SLC11A1, only the promoter GTn is known to be functional in regulating the expression of SLC11A152 and yet alleles at this locus were associated with only borderline significance. To determine whether the GTn contributed significant main effects to the haplotype associations observed, we carried out tests (Table 5b) to determine the main effects at each locus in a forward stepwise logistic regression procedure.49 For this stepwise analysis, we compared the contribution of the main disease-associated allele 3 at the GTn with the other two biallelic markers. At the first stage (rows 1–3), the GTn (borderline, P=0.05), 274C/T (P=0.01) and 469+14G/C (P=0.003) are all significant when included in the model on their own, that is, without accounting for effects at other loci. Once the GTn is included in the model, both the 274C/T (row 4) and the 469+14G/C (row 6) add significant main effects. Once the 274C/T is included, only the 469+14G/C (row 8) and not the GTn (row 5) has a main effect. Once the 469+14G/C is included, neither the GTn (row 7) nor the 274C/T (row 9) have main effects. Overall, this stepwise analysis indicates that all of the association between SLC11A1 and the disease can be accounted for by the 469+14G/C polymorphism. This is consistent with the observation (Table 6) that the 469+14G/C is in strong linkage disequilibrium with both the GTn (D′=0.7136) and the 274C/T (D′=0.8919), even though the latter are not in strong linkage disequilibrium with each other.

Table 6 Pairwise linkage disequilibrium between markers across SLC11A1 calculated using the Lewontin's D′50

Sequence analysis

The results of stepwise regression analysis for markers across the 5′ region of SLC11A1 suggested that all of the association with VL is accounted for by the polymorphism at 469+14G/C. Although the functional GTn might contribute to the overall 469+14G/C effect, the results suggest that other functional polymorphisms in linkage disequilibrium with 469+14G/C must be present and contribute. Since the intronic 469+14G/C SNP is itself unlikely to influence function, further sequence analysis was required to determine whether novel polymorphisms occur within the Masalit populations of eastern Sudan, which could account for this association. To this end, we sequenced 600 bp upstream of the transcription start site, intron 1, all 15 exons and intron/exon boundaries, the region of intron 4 containing the alternatively spliced exon 4a, and 819 bp downstream of the TAG stop codon in 29 cases from 27 families and seven unaffected individuals used in the linkage and association studies. Table 7 shows the new SNPs identified, together with allele frequencies in the 29 cases for the previously identified40,42,43 polymorphic markers that were included in the sequence analysis. To conform with standard notation, all previously identified and new SNPs are named in relation to the transcription start site located 148 bp 5′ of the methionine start codon.42 Previous notations40,43 were assigned relative to an arbitrary site 76 bp upstream of the methionine start codon. Eight new SNPs were identified (Table 7), only one of which occurred in the coding region at codon 39 in exon 2. This was a silent mutation, a C-to-T substitution that resulted in a synonymous amino-acid substitution, which would not cause a functional change in the SLC11A1 protein. Five other SNPs occurred at the intron/exon boundaries, and one in the 3′UTR. All of these were again unlikely to mediate functional changes. Sequence analysis within intron 4 confirmed the presence of exon 4a in this Sudanese population, but no variants that might relate through linkage disequilibrium to the significance associated with the intron 4 variant 469+14G/C were found. One new SNP 86 nucleotide 5′ of the transcription start site (ie position –86 bp) was located within a putative nuclear factor kappa B (NFκB)-binding site, which could be functional in the regulation of gene expression. The wild-type allele shared seven out of 10 nucleotides in the NFκB known consensus GGGRHTYYCC (where according to the universal genetic code, R=G or A; H=C, A or T and Y=C or T),53 while the mutant shared only six from 10 nucleotides. This variant occurred in only four (0.06) of the 58 chromosomes sequenced from four affected individuals (ie heterozygous in all), and so was unlikely to account for disease susceptibility on its own. The variant allele occurred in cis with allele 3 of the previously identified GTn polymorphism in one affected individual and in cis with allele 2 for three affected individuals.

Table 7 Results of sequence analysis across 600 bp upstream of the transcription start site, all 15 exons, intron 1 and all intron/exon boundaries, and 819 bp downstream of the TAG stop codon in 29 cases from 27 of the families and seven unaffected individuals, used in the linkage and association studies

Discussion

The data presented here provide evidence (maximum multipoint Zlr=2.55; P=0.008) for linkage between SLC11A1 and susceptibility to VL in members of the Masalit tribe in eastern Sudan, thus replicating the recent report54 of linkage (maximum multipoint LOD score=1.08; P=0.01) between SLC11A1 and VL in the related Aringa ethnic group to the north of our study population in Sudan. Our study extended this finding using family-based TDT and logistic regression analysis to demonstrate the allelic association with 5′ (GTn, 274C/T, 469+14G/C) but not 3′ (D543N, 3′UTR TGTG, 3′UTR CAAA) markers within SLC11A1. In a previous case–control analysis, Bellamy et al26 found a significant association between four intragenic SLC11A1 polymorphisms (CAn=GTn, INT4=469+14G/C, D543N and 3′UTR TGTG) and susceptibility to pulmonary tuberculosis in The Gambia. As with our study in Sudan, the two polymorphic markers in the 5′ region of the gene (CAn and INT4) were in strong linkage disequilibrium, as were the two markers in the 3′ region (D543N and 3′UTR TGTG), but markers in the 5′ region were not in linkage disequilibrium with the markers in the 3′ region. Since they had observed significant associations at all four markers, they concluded that separate polymorphisms in the 5′ and 3′ regions of the gene acted independently to control susceptibility to tuberculosis.26 In Sudan, we have failed to find any evidence either for association with markers in the 3′ region of SLC11A1 and VL, or any novel change of function mutations across this region. Nor did we find evidence for any novel (functional) variant in or around the alternatively spliced exon 4a that might account, through linkage disequilibrium, for the main effects observed at 469+G/C using stepwise logistic regression analysis. Previous studies51 have shown that exon 4a, encoded by an Alu element within intron 4, is transcribed in vivo but would introduce a termination codon in exon V, resulting in a truncated and hence nonfunctional SLC11A1 protein. At the mRNA level, the ratio of transcripts with/without exon 4a was 1:5 in macrophage cell lines.51 Hence, we hypothesised that any variant (eg elimination of the splice acceptor site for transcription of the alternatively spliced product) that caused a change in this ratio might influence the amount of functional SLC11A1 protein expressed. If such a variant occurs, it is not encoded at the site of exon 4a. However, variants in the promoter in linkage disequilibrium with 469+14G/C could be responsible for regulating the level of normal SLC11A1 transcribed, and/or the ratio of normal to alternatively spliced product. In our study, there was no novel change of function mutations in the 5′ coding region or intron 1 of SLC11A1 that would, on its own, account for the association with 469+14G/C. This suggests that there are further functional polymorphisms upstream of the GTn in linkage disequilibrium with 469+G/C that might contribute to these main effects. This could include the potentially functional variant at –86G/A, although the low frequency of this novel allele means that it is unlikely to account for all of the association with 469+G/C that we have observed. It is also possible that linkage disequilibrium might extend 230 kb upstream to the gene (IL8RA) encoding the receptor for interleukin 8, although confirmation of the role of Slc11a1 in susceptibility to VL using gene knockout in mice20 makes this the most likely candidate for our associations with disease in man. Work is in progress to sequence a larger region of upstream sequence in the SLC11A1 promoter, and to carry out functional reporter gene analysis of the –86G/A variant. For the moment, we conclude that the previously identified40,42 promoter region GTn polymorphism remains the only functional52 variant identified to date at SLC11A1. Since it is in strong linkage disequilibrium with 469+14G/C, it likely makes a significant contribution to the haplotype associations observed and some discussion of its role is therefore of interest.

Firstly, it was of interest that the main disease-associated haplotype carried allele 3 at the GTn, given the previous demonstration52 that this allele drives high levels of reporter gene (and hence SLC11A1) expression. These reporter gene constructs carried the four allelic variants of the GTn on constructs that incorporated 571–587 bp of promoter sequence 5′ of the ATG initiation codon.52 Although these constructs differed only at the GTn repeat, it is possible that other variants upstream of the GTn may modulate the promoter activity of allele 3 in macrophages in vivo. Nevertheless, in our previous studies31,37 examining the role of this polymorphism in determining susceptibility to autoimmune vs infectious disease susceptibility, we proposed that the high-expressing allele 3 would be associated with proinflammatory responses and autoimmune disease, while the low-expressing allele 2 would be associated with low antimicrobial activity and susceptibility to infectious disease. For those studies in which the promoter GTn has been studied, this hypothesis has held true in relation to susceptibility to pulmonary tuberculosis associated with allele 224,26,28 and autoimmune diseases associated with allele 3.31,33,36,37,55 In relation to infectious disease, the one exception to date is HIV, where disease was associated with allele 3.30 Given the propensity of tumour necrosis factor α (TNFα) to drive high levels of HIV expression via NFκB signalling,56 association with a high SLC11A1-expressing allele is not surprising. Similarly, VL is known to be associated with a high proinflammatory TNFα response and cachexia.57 In The Sudan, it is possible that high levels of expression of SLC11A1 driven by allele 3 have a detrimental effect on VL susceptibility by causing a high proinflammatory response. Conversely, in a recent study in The Gambia,58 we demonstrated that the tuberculosis-associated allele 2 at the GTn repeat was associated with high anti-inflammatory interleukin 10 responses. Hence, the delicate balance between the positive requirement for macrophage activation and antimicrobial activity and the negative effect of proinflammatory TNFα, both of which are pleiotropic effects of SLC11A1,59,60 appear to occur across the spectrum of infectious diseases associated with polymorphism SLC11A1 and not just in comparing infectious vs autoimmune disease susceptibility. Interestingly, in one family, disease was associated with the low-expressing52 allele 4 of the SLC11A1 GTn polymorphism, even though it was carried on an otherwise protective 1–2 haplotype for 274C/T-469+14G/C. Whether, in this case, individuals are susceptible to VL because they fail to express SLC11A1, or because allele 4 was on a haplotype with other functional polymorphisms in the SLC11A1 promoter, will require further investigation. Studies are in progress to examine SLC11A1 expression at the RNA and protein levels in individuals carrying the different SLC11A1 haplotypes that we have seen associated with disease in this population. We will also determine whether the –86G/A variant, or other variants upstream in the promoter region, modulate the function of the GTn repeat polymorphism.

Overall, our results have shown that SLC11A1 is associated with susceptibility to VL in eastern Sudan. Further work is underway to determine the precise functional basis to disease-promoting variants in the SLC11A1 gene, and to determine which of the many pleiotropic effects of the gene most affects disease phenotype in man. This we hope will contribute to a better understanding of the molecular basis to disease susceptibility, and the development of therapies appropriate to the susceptible individuals in this high-incidence population.