Main

It was recognized over half a century ago that malaria has been a major force of evolutionary selection on the human genome and that certain hematological disorders have risen to high frequency in malaria-endemic areas because they reduce the risk of death due to malaria1,2,3. Sickle hemoglobin (HbS) and glucose-6-phosphate dehydrogenase (G6PD) deficiency are often-quoted examples of natural selection due to malaria, and many other genetic associations with resistance or susceptibility to malaria have been reported2,3,4,5,6,7,8,9. However, the current literature contains many conflicting lines of evidence based on relatively small studies whose results have not been independently replicated.

To address this problem, we conducted a large multicenter case-control study of severe malaria across 12 locations in Burkina Faso, Cameroon, The Gambia, Ghana, Kenya, Malawi, Mali, Nigeria, Tanzania, Vietnam and Papua New Guinea (Supplementary Fig. 1 and Supplementary Table 1). The structure of this consortial project has been described elsewhere10, and information about each of the partner studies can be found on the Malaria Genomic Epidemiology Network (MalariaGEN) website (see URLs). We used the World Health Organization (WHO) definition of severe malaria, which comprises a broad spectrum of life-threatening clinical complications of P. falciparum infection11,12,13,14,15. In this report, we examine genetic associations with severe malaria in general and with two distinct clinical forms of severe malaria: cerebral malaria with a Blantyre coma score of less than 3 and severe malarial anemia with a hemoglobin level of less than 5 g/dl or a hematocrit level of less than 15%.

Results

Samples and clinical data

The first stage of work was to collect standardized clinical data on severe malaria from multiple locations (Supplementary Table 2). This effort presented many practical challenges, as severe malaria is an acute illness that mainly occurs in resource-poor settings where laboratory facilities are limited and medical records can be unreliable. It was necessary to allow for variations in the design and implementation of the study in different settings, with study characteristics depending on a range of local circumstances. Investigators at different sites agreed at the outset on principles for sharing data and on standardized clinical definitions, and they also worked together to define best ethical practices across different local settings, including the development of guidelines for informed consent10,16,17. A set of web tools was developed to enable investigators to curate data in their locally used format before transforming them to the standardized format necessary for data from different sites to be merged.

After data curation and quality control (Online Methods), 11,890 cases of severe malaria and 17,441 controls were included for analysis (Table 1 and Supplementary Table 3). Controls were intended to be representative of the populations to which the cases belonged; that is, a minority of controls may have subsequently gone on to develop severe malaria. The ancestry composition of the cases and controls at each location is shown in Supplementary Table 4. A total of 6,283 cases had cerebral malaria or severe malarial anemia, of which 3,345 had cerebral malaria only, 2,196 had severe malarial anemia only and 742 had both cerebral malaria and severe malarial anemia (Table 1). A further 5,607 cases did not have cerebral malaria or severe malarial anemia according to the criteria used here but satisfied the WHO definition of severe malaria, which includes a range of other clinical complications such as acidosis, respiratory distress and hypoglycemia that are not explored in detail in the present analysis11.

Table 1 Clinical phenotype case counts and percentage of case fatalities

Most of the cases of severe malaria were young children, with median ages ranging from 1.3 to 3.8 years at different study sites, except in Vietnam where most cases were young adults with a median age of 29 years. The median age for severe malarial anemia was 32 months, and the median age for cerebral malaria was 62 months (Supplementary Table 3). Cerebral malaria was more common than severe malarial anemia in The Gambia, Kenya, Malawi and Vietnam, whereas severe malarial anemia was more common than cerebral malaria in Burkina Faso, Cameroon, Ghana, Mali, Papua New Guinea and Tanzania. It has previously been observed that severe malarial anemia particularly affects young African children exposed to extremely high levels of malaria transmission, whereas cerebral malaria particularly affects older African children exposed to lower levels of malaria transmission18,19. The mean case fatality rate after treatment was 18% for cerebral malaria, 4% for severe malarial anemia and 10% for severe malaria overall (Table 1).

Genetic loci analyzed

We tested previously reported associations with severe malaria for 55 SNPs in 27 gene regions: ABO, ADORA2B, ATP2B4, C6, CD36, CD40LG, CR1, ACKR1 (DARC), G6PD, GNAS, HBB, ICAM1, IL1A, IL1B, IL4, IL10, IL13, IL22, IRF1, LTA, NOS2, SPTB, TLR1, TLR4, TLR6, TLR9 and TNF (for corresponding references, see Supplementary Tables 5, 6, 7). Note that this report focuses on SNPs that we were able to reliably genotype using the Sequenom platform; that is, we do not here consider structural variants such as those responsible for α-thalassaemia.Other SNPs were included for the purposes of quality control (Supplementary Fig. 2 and Supplementary Tables 5, 6, 7). All SNPs were initially tested for association with severe malaria using a standard logistic regression method assuming fixed effects across different populations and under a variety of models of inheritance (Online Methods). We repeated these tests of association separately for the severe malaria subphenotypes of cerebral malaria and severe malarial anemia. Results for each SNP are shown in Supplementary Figure 3 and Supplementary Tables 8, 9, 10, and below we discuss those SNPs that showed strong evidence of association in the multicenter analysis.

HBB. The HBB gene encodes β-globin, which has three well-known structural variants that have been associated with resistance to malaria: hemoglobin S (HbS), hemoglobin C (HbC) and hemoglobin E (HbE)2,7,8,9. The SNP responsible for HbS, rs334, was present at all the African sites, with heterozygote frequencies in controls ranging from 0.05 (Malawi) to 0.22 (Nigeria). Heterozygotes had reduced risk of severe malaria (odds ratio (OR) = 0.14; P = 1.6 × 10−225), cerebral malaria (OR = 0.11; P = 4.7 × 10−88) and severe malarial anemia (OR = 0.11; P = 9.3 × 10−65) (Fig. 1, Table 2 and Supplementary Table 11). The SNP responsible for HbC, rs33930165, was present only in individuals from West Africa (Burkina Faso, The Gambia, Ghana, Mali, Cameroon and Nigeria), with frequencies of the derived (non-ancestral/non-reference) alleles ranging from 0.01 to 0.15 in controls. The strongest signal of association with severe malaria was seen under an additive genetic model; that is, the greatest protective effect was seen in homozygotes. Each copy of the derived allele reduced the risk of severe malaria by 29% (OR = 0.71; P = 6.9 × 10−9), reduced the risk of cerebral malaria by 28% (OR = 0.72; P = 0.01) and reduced the risk of severe malarial anemia by 26% (OR= 0.74; P = 2.1 × 10−3) (Table 2). The SNP responsible for HbE, rs33950507, was found at a derived allele frequency of 0.4 in the S'tieng ancestry group in Vietnam but was rare or absent in other ancestry groups, such that the sample was too small to estimate association with severe malaria.

Figure 1: Forest plots for association with severe malaria and subphenotypes.
figure 1

ORs and 95% CIs (gray bars) are shown for the sickle cell trait (rs334, heterozygote model), blood group O (rs8176719, recessive model), ATP2B4 (rs10900585, dominant model), G6PD deficiency (rs1050828, additive model) and CD40LG (rs3092945, recessive model) for association with cerebral malaria (red circles) and severe malarial anemia (blue circles) in all individuals combined. Results are adjusted for sex, ancestry and (with the exception of rs334) the sickle cell trait. Results are not presented when the sample size was too small (fewer than five cases or controls with the relevant genotype) or for locations where the derived allele was absent. Further details are available in Supplementary Tables 11, 12, 13, 14, 15, 16, 17, 18, 19. OR = 1, representing no effect, is highlighted by the vertical dashed lines.

Table 2 Autosomal SNPs with strong association signals

ABO. The ABO gene encodes the glycosyltransferase enzyme that determines ABO blood group. Individuals who are homozygous for a single-nucleotide deletion (rs8176719) in ABO have an inactive form of the glycosyltransferase and can be classified as blood group O (ref. 20). Estimated by this criterion, the frequency of blood group O in control samples ranged from 0.32 in Papua New Guinea to 0.62 in Nigeria (Supplementary Table 12). Aggregated analysis across all sites showed that blood group O was associated with decreased risk of severe malaria (OR = 0.74; P = 5.0 × 10−32), with reduced risk of cerebral malaria (OR = 0.73; P = 8.9 × 10−16) and with reduced risk of severe malarial anemia (OR = 0.68; P = 7.9 × 10−14) (Fig. 1 and Table 2). We also analyzed rs8176746, a nonsynonymous coding SNP in ABO that is in linkage disequilibrium (LD) with rs8176719 and determines the production of B antigens such that the majority of individuals carrying the derived allele express blood group B (refs. 20,21,22). The derived allele was associated in a dominant gene model with increased risk of severe malaria (OR = 1.25; P = 2.0 × 10−17) (Table 2).

G6PD. The G6PD gene is an X-linked gene encoding glucose-6-phosphate dehydrogenase with many allelic variants23. The major form of G6PD enzyme deficiency in Africa is encoded by the derived allele of rs1050828 (G6PD c.202C>T), commonly known as G6PD+202T24. Two other SNPs that cause G6PD deficiency were found in individuals from The Gambia but were rare in the other populations studied here25. In this study, the G6PD+202T allele was present at frequencies ranging from 0.03 in The Gambia to 0.28 in Nigeria (Supplementary Table 13). Aggregated across all African sites, we found an increased risk of severe malarial anemia in male hemizygotes (OR = 1.49; P = 3.6 × 10−5) and in female homozygotes under a recessive model of association (OR = 1.94; P = 1.9 × 10−3) (Table 3 and Supplementary Tables 14 and 15). In contrast, there was a trend toward decreased risk of cerebral malaria in female heterozygotes (OR = 0.87; P = 0.06) and male hemizygotes (OR = 0.81; P = 0.01) (Supplementary Tables 14, 16 and 17). Below, we discuss this heterogeneity of effect in more detail. Similar but weaker trends were observed for rs1050829, which marks the ancestral lineage on which G6PD+202 originated24.

Table 3 X-chromosome SNPs with strong association signals

ATP2B4. The ATP2B4 gene, encoding a calcium transporter found in the plasma membrane of erythrocytes, has been identified by genome-wide association study (GWAS) as a malaria resistance locus26. We typed four SNPs in this gene that were found to be in LD; the derived alleles of rs10900585 and rs55868763 were associated with increased risk of severe malaria, whereas the derived alleles of rs4951074 and rs1541255 were associated with decreased risk (Table 2 and Supplementary Tables 8, 9, 10 and 18). When aggregated across all African sites, individuals carrying at least one copy of the derived allele at rs10900585 had an OR of 1.32 for severe malaria (P = 1.7 × 10−9), whereas individuals homozygous for the derived allele at rs4951074 had an OR of 0.77 (P = 7.6 × 10−7). In both cases, the magnitude of the genetic effect was similar for cerebral malaria and severe malarial anemia (Fig. 1).

CD40LG. The CD40LG gene is a gene on the X chromosome encoding CD40 ligand that has previously been associated with severe malaria27. Homozygotes for the derived allele of a SNP in the 5′ UTR (rs3092945) showed reduced risk of severe malaria (OR = 0.85; P = 1.1 × 10−6), with a similar trend of protection in both males (OR = 0.90; P = 0.01) and females (OR = 0.78; P = 8.9 × 10−5) when the data were aggregated across sites (Table 3). However, when sites were analyzed individually, the results were strikingly different between sites: homozygotes for the derived allele showed significantly reduced risk of severe malaria in The Gambia (OR = 0.54; P = 2.3 × 10−22) but significantly increased risk in Kenya (OR = 1.42; P = 7.8 × 10−6) (Supplementary Table 19).

Other loci. None of the other loci tested here showed consistent evidence of association with severe malaria in the multicenter analysis with a significance of P < 1 × 10−4. All variants tested, some of which had weak associations that merit further investigation, are shown in Supplementary Figure 3 and Supplementary Tables 8, 9, 10. At the CD36 locus, heterozygotes for the codon variant rs201346212 tended to have reduced risk of severe malaria (OR = 0.67; P = 4.2 × 10−4). Other weak signals of association (P values in the range of 0.05 to 0.001) were observed for CD36, IL1A and IRF1 with severe malaria overall, for CR1 and IL4 with cerebral malaria and for IL20RA with severe malarial anemia. Although it is clear from these data that many genetic associations reported in the literature might have been false positives, as has been observed for other common diseases28, it is undoubtedly also the case that authentic genetic associations might be missed by multicenter studies if the effect is weak and there is heterogeneity of effect across different study sites.

Epistasis between significantly associated loci

Epistasis between malaria resistance loci has been reported in previous studies29,30. We therefore tested for pairwise interaction between all SNPs that showed significant association at the HBB, ABO, G6PD, ATP2B4 and CD40LG loci (Supplementary Fig. 4 and Supplementary Table 20). This analysis did not identify any strong evidence of interaction, but a marginally significant effect was observed between the ATP2B4 locus (rs10900585) and the allele for HbC (rs33930165; P = 1.3 × 10−3), such that the ancestral allele of rs10900585, which was the minor allele in Africa, tended to reverse the protective effect of the HbC allele. This association warrants further investigation, as ATP2B4 encodes the major erythrocyte calcium channel and intracellular calcium levels have been noted to affect the clinical phenotype of sickling disorders31.

Heterogeneity of effect

The large sample size of this study allowed us to investigate the heterogeneity in effect of malaria resistance loci in greater detail than has hitherto been possible. When associations with HBB, ABO, G6PD, ATP2B4 and CD40LG were analyzed by population and when cerebral malaria and severe malarial anemia were treated as separate phenotypic entities, various different patterns were observed (Fig. 1). Standard approaches to genetic association analysis, such as the logistic regression methods used above, apply a fixed-effects model that assumes that true associations should be constant across different studies. However, patterns of disease association could potentially vary owing to a range of genetic, environmental and biological factors, and by understanding this variation we might gain important scientific insights. We therefore used a Bayesian statistical framework to evaluate different models of genetic association, allowing for heterogeneity of effect (see the Online Methods and ref. 32). In essence, this approach weighs up the evidence that a genetic effect is fixed or heterogeneous when it is compared across different populations and clinical phenotypes, for example, cerebral malaria and severe malarial anemia. Here we allowed for two kinds of heterogeneous effect: correlated effects, which are not fixed but tend to behave in a similar way, and independent effects, which have no tendency to behave in a similar way. We estimated the posterior probability of each model under the a priori assumption that all models were equally likely. The results for specific SNPs in HBB, ABO, G6PD, ATP2B4 and CD40LG are shown in Figure 2, and the results for all SNPs are shown in Supplementary Figure 5.

Figure 2: Genetic heterogeneity for the severe malaria subtypes cerebral malaria only and severe malarial anemia only within and across African sites for significant loci.
figure 2

Bar plots show the distribution of probability between each of nine models of association where the effects on each phenotype are fixed, independent or correlated within a site in combination with being fixed, independent or correlated across all sites. Models are a priori assumed to be equally likely (see the Online Methods and supplementary material for details). Results are shown for SNPs rs334 (HbS, heterozygote model) in HBB, rs8176719 homozygotes (blood group O) in ABO, rs10900585 in ATP2B4, G6PD+202 (rs1050828) in G6PD and rs3092945 in CD40LG.

For HbS and blood group O, there was strong evidence for fixed or correlated effects across different populations and clinical phenotypes (Figs. 1 and 2). HbS confers a remarkably fixed level of protection against cerebral malaria and severe malarial anemia, with heterozygotes for the derived allele showing 89% reduced risk for both conditions when data were averaged across sites (Table 2). HbC and blood group O were less strongly protective than HbS, but they too had very similar effects on cerebral malaria and severe malarial anemia. There are many different theories about the molecular and cellular mechanisms by which HbS, HbC and blood group O act to protect against malaria22,33,34,35,36. The fact that these variants protect equally against clinical complications as disparate as coma and severe anemia implies that they act through some general mechanism, for example, by suppressing parasite density, rather than through a specific effect on particular pathological processes such as cerebral malaria or severe malarial anemia.

At the CD40LG and ATP2B4 loci, there was evidence for heterogeneity of effect across populations (Figs. 1 and 2). For rs3092945 in CD40LG, the posterior probability of independent effects in different populations was greater than 90%, whereas the effects on clinical phenotype were relatively constant. In the case of rs10900585 in ATP2B4, the posterior probability was more evenly balanced between the various models, with some evidence for both fixed and independent effects in different populations. Heterogeneity in effect across populations might indicate some source of biological variation such as epistasis or gene-environment interactions, or it might be due to the associated SNPs simply being genetic markers that have variable patterns of LD with the true malaria resistance alleles. Variable patterns of LD are particularly common in Africa, such that authentic genetic associations may fail to replicate in different locations unless the causal variant is directly genotyped, but in general this form of population heterogeneity might prove useful in the genetic fine mapping of causal variants32,37,38.

The genetic effects observed for G6PD were strikingly different in character from those seen at the HBB, ABO, ATP2B4 and CD40LG loci (Figs. 1 and 2). There was strong heterogeneity of effect across the phenotypes, with more than 80% posterior probability that the G6PD+202T allele had independent effects on cerebral malaria and severe malarial anemia. This difference in effect was observed consistently across the different populations. Previous studies have concluded that G6PD deficiency is protective against P. falciparum, although there has been debate about whether the protective effect is confined to female heterozygotes or is also present in male hemizygotes4,5,25,39. With a much larger sample size than any previous study, our data show that male hemizygotes and female homozygotes have increased risk of severe malarial anemia, whereas male hemizygotes and female heterozygotes have reduced risk of cerebral malaria. If males and females are combined in an additive model, the G6PD+202T allele confers reduced risk of cerebral malaria (OR = 0.91; P = 6.1 × 10−3) but increased risk of severe malarial anemia (OR = 1.19; P = 2.6 × 10−5), and the overall effect on severe malaria is close to neutral (OR = 1.02; P = 0.15) (Table 3 and Supplementary Tables 13, 14, 15, 16, 17).

Discussion

Severe malarial anemia is a complex pathological entity40,41. Parasites invade and destroy erythrocytes as they replicate, and the host response to infection also leads to erythrocyte destruction and bone marrow suppression. In the developing world, the resulting anemia is often aggravated by chronic nutritional deficiency and helminthic infection41,42. G6PD deficiency and the hemoglobinopathies are interesting examples of a biological tradeoff between an inherent tendency to cause anemia and the potential to protect against anemia by protecting against malaria. The present study allows us to observe the outcome of this tradeoff with greater resolution than has hitherto been possible, showing that the risk of severe malarial anemia is significantly reduced for HbS heterozygotes, HbC homozygotes and HbC heterozygotes, whereas for G6PD-deficient male hemizygotes and female homozygotes the risk is significantly increased.

G6PD deficiency has excited much interest among evolutionary biologists because it is so common in the human population and displays such a remarkable diversity of allelic forms throughout the tropics and subtropics3,6,9,23,39,43. The main force for evolutionary selection is widely assumed to be severe malaria due to P. falciparum, but this focus is called into question by the present findings showing that G6PD deficiency has little effect on the overall risk of severe malaria. These data do not exclude the possibility that P. falciparum has had a role, as the present-day risk of cerebral malaria and severe malarial anemia malaria might not accurately reflect patterns of disease and fatality caused by P. falciparum in the past, particularly before antimalarial drugs became widely used. However, it is necessary to consider the possibility of some other evolutionary driving force, and an obvious candidate is Plasmodium vivax, once regarded as benign but increasingly recognized as causing a substantial burden of fatality and severe disease44. In a region of Thailand where both Plasmodium species are endemic, G6PD deficiency has been observed to suppress P. vivax infection more effectively than P. falciparum infection45. An important biological difference of P. vivax relative to P. falciparum is that it preferentially infects reticulocytes (young erythrocytes), which have higher levels of G6PD enzyme activity than older erythrocytes, and it is conceivable that in these circumstances G6PD deficiency might exert a stronger protective effect. The geographical distribution of G6PD deficiency broadly coincides with the transmission range of P. vivax, with the notable exception of large parts of sub-Saharan Africa, where G6PD deficiency is common but P. vivax is absent43,46. It has recently been discovered that chimpanzees and gorillas in Africa carry parasites that are closely related to P. vivax47. Evolutionary analysis of these ape parasite lineages indicates that human P. vivax is of African origin, suggesting that P. vivax might have been common in Africa before the selective sweep of the FYBES (FY*O) allele (Duffy negative blood group) led to its elimination from most of the continent48,49.

Large multicenter studies have transformed the field of human genetics over the past decade, but major practical challenges remain in conducting such studies in the developing world. Obtaining reliable phenotypic data can be difficult in resource-poor settings, particularly for acute conditions whose diagnosis depends on accurate clinical records at the time of illness, such as severe malaria. We have endeavored to overcome these obstacles by establishing systems for standardizing and sharing data from research groups in different countries. The extremely strong phenotypic associations observed for the sickle cell trait and blood group O and their remarkably consistent effects on different clinical forms of severe malaria illustrate the effectiveness of this approach and provide a benchmark for evaluating other loci. A key finding is that most previously reported candidate gene associations do not replicate, and an important use of the sample collections established by this project will be to discover authentic susceptibility loci by conducting GWAS on a larger scale than was previously possible. A critical consideration when conducting multicenter GWAS is that true genetic effects can exhibit marked heterogeneity across different populations, and here we propose a statistical framework for dealing with this problem. There are many potential sources of heterogeneity, including interactions with other infections and environmental variables. A major source of heterogeneity could be parasite genetic variation, whose impact on the clinical outcome of malaria remains very poorly understood, and this is an area of future investigation that could yield important biological insights. Previously, this has been difficult to address in a systematic manner, but with growing knowledge about the natural landscape of genome variation in both the host and the parasite there is the potential to discover new genetic interactions, for example, between parasite ligands and host receptors for erythrocyte invasion, that could be of great practical importance for vaccine development. This work will require a framework for large-scale genetic epidemiology studies, and here we demonstrate the feasibility of integrating data across multiple locations to achieve scientific insights that could not be achieved by individual studies in isolation.

Methods

Cases of severe malaria were recruited on admission to hospital, usually as part of a larger program of clinical research on malaria, designed and led by local investigators. A control group was recruited at each of the study sites to match the ancestry composition of the cases (Supplementary Tables 1, 2 and 4). The control group was intended to be representative of the general population, and cord blood samples were used as controls at several study sites. We describe elsewhere the details of study design at individual sites and local epidemiological conditions, including malaria endemicity (see refs. 21,50,51,52,53 and the MalariaGEN website (see URLs). Following consultation with the Severe Malaria in African Children network and other clinical experts, a standardized case report form was developed to record the clinical features of severe malaria14 (the case report form can be found on the MalariaGEN website; see URLs). This form was not intended to replace local practice but to encourage uniformity in the core data collected across the different sites. A secure web application was developed to enable investigators to upload and curate their data and to transform it into standardized units and format before releasing it to the consortial database. A data fellow was appointed at each site with responsibility for the process of integrating local clinical data with the central database. As part of a capacity-building program, data fellows received training in data management and analysis. Moreover, ethics advice, support and training/capacity building in ethics were provided to data fellows and partners throughout the life of the study.

The normalized clinical data from each study site were combined to ascertain phenotypes in a standardized manner across the entire data set (Supplementary Table 21). A case of severe malaria was defined as an individual admitted to a hospital or clinic with P. falciparum parasites in the blood film and with clinical features of severe malaria as defined by WHO criteria11,12. Severe malaria comprises a number of overlapping syndromes, with the most commonly reported being cerebral malaria and severe malarial anemia. In keeping with standard criteria, cerebral malaria was defined here as a case of severe malaria with a Blantyre coma score of <3 for a child or a Glasgow coma score of <9 for an adult. Severe malarial anemia was defined here as a case of severe malaria with a hemoglobin level of <5 g/dl or a hematocrit level of <15%. In this report, we did not attempt to classify other severe malaria syndromes, such as respiratory distress, that are more complicated to standardize among study sites, although they would be present in our data set. Control samples were either collected from cord blood or, if sampled from the local population, were microscopically negative for malaria.

Ethics.

We All studies were collected under the approval of the appropriate ethics committees, and all participants gave informed consent. Please refer to Supplementary Table 1 and the MalariaGEN website (see URLs) for further details.

Genotyping.

We selected the Sequenom iPLEX MassARRAY platform for genotyping because of its high-throughput capacity for samples, adaptability for assay design and ability to genotype up to 40 SNPs in a single reaction. We chose to design two iPLEX multiplexes as a compromise between maximizing the number of SNPs we could type on any given sample and minimizing the time and cost to genotype all the samples submitted to the MalariaGEN Resource Centre for the various projects.

Altogether, assays for 89 SNPs were finally designed and tested in 2 rounds of multiplex design on 33,138 samples (Supplementary Fig. 2 and Supplementary Tables 5, 6, 7). Of these, 16 SNPs were excluded during the multiplex design and testing phase for a number of reasons: 9 were missing data in more than 20% of the samples; 1 showed a mismatch between the published sequence and the human reference genome; 1 was monomorphic; and 5 could not be redesigned into the final multiplexes (Supplementary Table 7). Of the remaining 73 SNPs genotyped, 55 were included on the basis of a known genetic association with severe malaria (Supplementary Table 5), 3 were used to confirm or type sex and 15 were selected to aid in sample quality control (Supplementary Tables 6 and 22, 23, 24, and Supplementary Note). In the quality control phase, 1 of the 55 SNPs with a known association with severe malaria, rs1800750 (TNF c.–376G>A), showed a large deviation in Hardy-Weinberg equilibrium and was removed from further analysis (see the 'Statistical analysis' section below).

Samples.

A total of 38,926 individual records comprising 16,433 cases of severe malaria and 22,492 controls were obtained from across the 12 study sites (Supplementary Table 2). Clinical data were missing for sex in 4% of records (we confirmed or typed sample sex by genotyping) and for ancestry in 2% of records. A total of 33,138 samples were genotyped. Each sample was assessed for inclusion in the analysis if it was successfully genotyped at more than 90% of the 65 analysis SNPs (excluding ATP2B4 SNPs): we excluded 789 samples on this basis (Supplementary Fig. 6). The majority of sample failures were found to be due to blood storage and DNA extraction issues. After quality control of both phenotypic and genotypic data, 11,890 severe malaria cases and 17,441 controls were included for analysis (Table 1).

There were 213 different ancestry groups, of which 41 comprised at least 5% of the individuals at a study site; these included Mandinka, Jola, Wollof and Fula (The Gambia); Bambara, Malinke, Peulh and Sarakole (Mali); Mossi (Burkina Faso); Akan, Frarra, Nankana and Kasem (Ghana); Yoruba (Nigeria); Bantu and Semi-Bantu (Cameroon); Chonyi, Giriama and Kauma (Kenya); Mzigua, Wasambaa and Wabondei (Tanzania); Chewa (Malawi); Madang and Sepik (Papua New Guinea); and Kinh (Vietnam). For purposes of analysis, we classified ancestry groups with a very small sample size (less than 5% of individuals at any study site) as 'other' (Supplementary Table 4).

Statistical analysis.

All statistical analyses were performed using the R statistical software environment (see URLs). As part of the genotyping quality control process, we identified SNPs with large deviations from Hardy-Weinberg equilibrium that might signify assay failure. Overall Hardy-Weinberg equilibrium was assessed from the distribution of Hardy-Weinberg equilibrium P values calculated for each SNP by country and ancestry group, after discarding groups where the calculated allele frequency was less than 5/2N (where N was the number of individuals in the group)54. An assay was marked for potential exclusion if the results deviated from Hardy-Weinberg equilibrium (P < 1 × 10−4)in more than four ancestry groups (Supplementary Fig. 7).

Single-SNP tests, adjusted for HbS genotype, sex and ancestry, for association with severe malaria and the severe malaria subtypes cerebral malaria only and severe malarial anemia only were performed for the 55 SNPs with a known association with severe malaria. Standard logistic regression models were used for tests of association at each autosomal SNP (Supplementary Table 25). Primary analyses comprised tests of association between each SNP and severe malaria phenotypes across all individuals combined as well as separately by sex (X-chromosome SNPs only) and study site: genotypic, additive, dominant, recessive and heterozygote advantage genetic models of inheritance were considered. For X-chromosome SNPs, males were treated as homozygous females. Therefore, when analyzing the males only for X-chromosome SNPs, the genotypic, dominant, recessive and additive models were equivalent and the heterozygous model was redundant: in this case, we present the results from the dominant model (referred to as the male hemizygote model for males at X-chromosome SNPs) and note that the ORs correspond to the change in the odds of disease for males hemizygous for the derived allele in comparison to males hemizygous for the ancestral allele. For combined analyses of males and females at X-chromosome SNPS, robust estimates of variance were used to account for the unequal variance55; all models were then appropriate. In secondary analyses, we considered additional genetic models comparing effects between homozygotes and heterozygotes at selected SNPs. ORs and 95% CIs were derived from Wald tests applied to regression coefficients. Significance was assessed using likelihood ratio tests of association, except for in combined analyses of males and females at X-chromosome SNPs where Wald tests were applied using the robust variance estimates. Results are presented with respect to the association between the derived (non-ancestral) allele and the severe malaria phenotype in question.

Standard logistic regression analyses assume that effects are fixed across all sites. To investigate evidence for genetic heterogeneity across severe malaria subtypes both within and across African sites, we compared different models of association in a Bayesian statistical framework. The models we considered comprised fixed, independent or correlated effects between subtypes within a site crossed with fixed, independent or correlated effects of each subtype across all sites. See the supplementary material for further details. For each SNP, we assumed a normally distributed prior on the log OR of association with a mean of 0 and standard deviation σ, where σ = 1 for rs334 to reflect the prior belief that the effect size is large, consistent with the observed ORs of approximately 0.1; σ = 0.4 for SNPs found to be significant in fixed-effects analysis; and σ = 0.2 otherwise. To model fixed, independent or correlated effects either within or across sites, we set the correlation parameters between subtypes to 1, 0.1 and 0.96, respectively. Multinomial regression was used to make independent maximum-likelihood estimates of the effect of each SNP on these mutually exclusive subtypes for all individuals combined at each African site. Estimates were adjusted for sex, ancestry and, with the exception of rs334, the sickle cell trait. Approximate Bayes factors (ABFs) were then calculated for each SNP and model and used to estimate a posterior probability of each of the models for each SNP.

We also tested for interaction between all pairs of SNPs that were significant in the single-SNP analysis; 25 pairs of markers were tested (Supplementary Table 20). We considered two different statistical models of interaction: (i) a 1-degree-of-freedom 'best model' test for the optimal genetic model for each SNP (as defined by association with severe malaria for all individuals across all sites in a fixed-effects model adjusted for ancestry and sex) at each of the interacting loci and (ii) a more general 'genotype' test using a model that allowed for separate effects for heterozygous and homozygous genotypes at each of the interacting loci. At X-chromosome SNPs, male individuals were treated as homozygous females and only additive effects were considered. For a pair of autosomal SNPs, the genotype test was then a 4-degrees-of-freedom test of interaction; for a pair comprising an autosomal SNP and an X-chromosome SNP, it was a 3-degrees-of-freedom test; and, for a pair of X-chromosome SNPs, it was a 1-degree-of-freedom test. Tests of interaction were performed by testing whether the regression coefficients that represent interaction terms in the corresponding logistic regression model were equal to zero or not. These tests are described in more detail in Cordell56.

URLs.

MalariaGEN Partner Sites, http://www.malariagen.net/projects/cp1; MalariaGEN case report form, http://www.malariagen.net/files/downloads/23.pdf; MalariaGEN ethics and governance, http://www.malariagen.net/community/ethics-governance; R project, http://www.r-project.org/.