Observed and expected frequencies of structural hemoglobin variants in newborn screening surveys in Africa and the Middle East: deviations from Hardy-Weinberg equilibrium

Purpose:Our objective was to compare observed and expected genotype proportions from newborn screening surveys of structural hemoglobin variants.Methods:We conducted a systematic review of newborn screening surveys of hemoglobins S and C in Africa and the Middle East. We compared observed frequencies to those expected assuming Hardy-Weinberg equilibrium (HWE). Significant deviations were identified by an exact test. The fixation index FIS was calculated to assess excess homozygosity. We compared newborn estimates corrected and uncorrected for HWE deviations using demographic data.Results:Sixty samples reported genotype counts for hemoglobin variants in Africa and the Middle East. Observed and expected counts matched in 27%. The observed number of sickle cell anemia (SCA) individuals was higher than expected in 42 samples, reaching significance (P < 0.05) in 24. High FIS values were common across the study regions. The estimated total number of newborns with SCA, corrected based on FIS, was 33,261 annual births instead of 24,958 for the 38 samples across sub-Saharan Africa and 1,109 annual births instead of 578 for 12 samples from the Middle East.Conclusion:Differences between observed and expected genotype frequencies are common in surveys of hemoglobin variants in the study regions. Further research is required to identify and quantify factors responsible for such deviations. Estimates based on HWE might substantially underestimate the annual number of SCA-affected newborns (up to one-third in sub-Saharan Africa and one-half in the Middle East).Genet Med 18 3, 265–274.


INTRODUCTION
Hemoglobin S (HbS) is a structural variant of normal adult hemoglobin (HbA) caused by an amino acid substitution at position 6 of the β-globin gene (HBB c.20A>T; p.Glu6-Val). 1 Individuals who have inherited HbS are usually asymptomatic when heterozygous (AS), whereas homozygous individuals with HbS (SS) suffer from sickle cell anemia (SCA), a disease associated with severe clinical complications (including recurring pain, vaso-occlusive crises, and inflammation, all of which can lead to organ damage) and high mortality rates in low-income, high-burden countries. 2 Compound heterozygosity with other β-globin polymorphisms, the most common of which are hemoglobin C (HbC) and β-thalassemia, can also cause sickle cell disease (SCD); individuals with Sβ 0thalassemia suffer from a form of SCD that is clinically indistinguishable from SS. HbS is most prevalent in sub-Saharan Africa and parts of the Mediterranean, the Middle East, and India, because of natural selection for heterozygous individuals through a survival advantage against Plasmodium falciparum malaria. 3 Sickle hemoglobin is often considered to be the most common pathological hemoglobin variant worldwide. Globally, it has been conservatively estimated that 305,800 (confidence interval: 238,400-398,800) babies were born with SS in 2010, in addition to infants with other variants that cause sickle cell disease. 4 Due to population growth, this number increases every year and could reach more than 400,000 by 2050. 4 HbC is another structural variant of HbA caused by a different amino acid substitution at the same position of the β-globin gene (HBB c.19G>A; p.Glu6Lys). 5 Heterozygous individuals (AC) are asymptomatic, whereas homozygosity (CC) causes mild hemolytic anemia due to the reduced solubility of the red blood cells that can lead to crystal formation. HbC is of clinical significance mainly when inherited in combination with HbS (SC), causing chronic hemolytic anemia and intermittent sickle cell crises slightly less severe or frequent than in SS individuals; when co-inherited with β-thalassemia (hemoglobin C-β thalassemia), it causes moderate hemolytic anemia with splenomegaly. HbC provides nearly full protection against complicated P. falciparum malaria in homozygous (CC) individuals and intermediate protection in heterozygous (AC) individuals. 6 Original research article This variant has been under positive selection across West Africa, particularly in Burkina Faso, Ghana, and Togo, where it reaches frequencies of up to 15%. 7 The Hardy-Weinberg (HW) principle defines the relationship between allele frequencies and genotype counts in successive generations and predicts that in a random mating population of infinite size, allele and genotype frequencies should remain constant from one generation to the next in the absence of any disturbing factors. 8,9 The HW non-evolutionary model is commonly referred to as the Hardy-Weinberg equilibrium (HWE) and represents one of the most basic hypotheses in population genetics and evolutionary biology. 10,11 The estimate of 305,800 infants born with SS disease in 2010 is derived from a model that assumes HWE and projects the frequency of SS based on the frequency of the S allele in population studies. 4 Multiple factors can result in deviations from HWE in population samples. The two primary factors traditionally assumed to account for significant deviations from HWE are inbreeding or consanguinity due to mating among close relatives (endogamy), and population admixture or stratification due to short-and/or long-distance migration (gene flow). 10 Additional factors that can lead to deviation from HWE include the occurrence of new mutations, genetic drift in small populations, natural selection, and cryptic relatedness in isolated populations, 11 as well as several methodological artifacts, including selection bias, 12 genotyping errors, 13 and non-randomly missing genotypes. 14 The most important of these factors, in addition to the two traditional factors of inbreeding and population stratification, appears to be genotyping errors and selection bias. 15 In countries where consanguinity is uncommon, deviation from HWE is used for quality control in gene association analyses. 16 Several factors can cause deviations from HWE at the same time in a given population, either with cumulative effects or with opposite evolutionary trends. For example, one factor (e.g., inbreeding or improved fitness of homozygotes) may tend to increase the allele frequency of the gene studied while another tends to reduce it (e.g., termination of affected pregnancies or elimination of malaria selective pressure, either naturally or through human control interventions). Furthermore, some of these disturbing factors have an immediate effect on allele or genotype frequencies (e.g., migrations), whereas others will cause slow changes over multiple generations (e.g., natural selection). Importantly, neglecting factors causing deviations from HWE can lead to an incorrect interpretation of the data from screening studies and to an underestimation or overestimation of the number of individuals affected at local, national, regional, or global scales.
In theory, data from universal newborn screening studies offer the unique advantage that they represent the best measure of true genotype frequencies. At ages beyond infancy, excess mortality is likely to reduce the frequency of individuals homozygous for deleterious mutations such as HbS. Furthermore, in areas endemic for malaria, survival among heterozygous individuals (with HbAS) is greater than that among normal individuals (with HbAA) due to their protection against severe malaria. Two main national, regional, and global estimates of newborns affected by sickle cell anemia have been published in the past decade. In 2008, Modell and Darlison published conservative newborn estimates for all common hemoglobinopathies using the HW equation corrected by national population coefficients of consanguinity obtained from Bittles's database and Murdock's ethnographic atlas. 17 In 2013, we published newborn frequency estimates for HbSS, HbAS, HbCC, and HbAC based on a Bayesian geostatistical framework and on HWE that accounted for spatial heterogeneities in the frequency of hemoglobin variants and in the distribution of human populations within countries. 4,7 Although we considered consanguinity as a potential confounder in the previous studies, it was not incorporated in our modeling framework due to the unavailability of a coefficient of consanguinity for many countries and due to the lack of data on subnational variations for this covariate for all countries. We therefore assumed that the use of such a covariate in a framework designed to account for spatial heterogeneities would potentially introduce more biases than corrections. Our published estimates of HbSS and HbCC annual births, calculated using allele frequencies assuming HWE, likely represent underestimates of the prevalence of newborns affected by these genotypes in regions in which consanguinity is common. Because our modeling frameworks were independently developed for HbS and HbC, estimates of HbSC annual births were not calculated. Despite not correcting for consanguinity, our global HbSS and HbCC estimates were generally higher than those of Modell and Darlison, even when correcting for demographic changes. Those findings suggested that the global health burden of hemoglobin variants-HbS in particularmight be substantially higher than was believed at the time.
In the present study, we aimed to assess the frequency and magnitude of deviations from HWE for HbS and HbC in Africa and the Middle East by reviewing existing data from newborn screening surveys conducted in these regions and discuss the impact of such deviations on estimates of affected newborns at various geographical scales.

Newborn screening data
We conducted a systematic search of the published literature using PubMed, Web of Science, and Scopus using the following search string: "sickle AND newborn AND screening AND (Africa OR Bahrain OR Iran OR Iraq OR Israel OR Jordan OR Kuwait OR Lebanon OR Oman OR Palestine OR Qatar OR "Saudi Arabia" OR Syria OR "United Arab Emirates" OR Yemen OR India)". Although the use of the term "Africa" seemed conservative, individual names of countries were used for the Middle East as the inclusion of this more generic term missed several key references. Searches, last updated on 26 January 2015, returned 159, 121 and 127 references, respectively, in each of these resources. After duplicate removal, 247 references remained, of which 30 contained relevant newborn screening data (see Supplementary Table S1 online). Six studies that did not appear in our systematic search but that had Original research article already been identified during preliminary manual searches 18 were also included.
The inclusion criteria for studies were: (i) that subjects were tested at birth or within the first 28 days of life, either randomly or as part of a universal screening program, and (ii) that they reported detailed genotype counts for HbS. No minimum sample size cut-off value was used (accepting that results from small studies need to be considered with caution because of a likelihood of large differences in ratios due to chance). The diagnostic methods used in each screening study were also recorded (Supplementary Table S2 online). Because our search criteria revealed only two relatively small newborn screening studies from India, 19,20 we excluded these studies and restricted our focus to Africa and the Middle East. When more than one publication reported overlapping data, only the most recent or comprehensive report was included. Updated data for the Kumasi survey from Ghana were presented during a Centers for Disease Control and Prevention (CDC) webinar in March 2014 and were therefore used in place of the numbers published in 2008.

Deviation from HWE
HWE equations have been described in detail in several classic textbooks of population genetics 12,21 and are only briefly presented in the Supplementary Information. Various tests have been designed to measure deviation from HWE. Although such a test is relatively straightforward for two alleles, it is more complex for multiple alleles (n ≥ 3). 22 Tests of HWE are often performed using Pearson's χ 2 goodness-of-fit test. It is now well established that this asymptotic method is unreliable for small samples and is associated with substantial errors in large samples with a low-prevalence of mutant alleles. As a result, exact tests are usually preferable. 15,22 Here, we used the likelihood ratio full-enumeration exact test, as implemented in the "HWxtest" package in R. 23 In this test, all possible tables with the same allele numbers as in the observed counts are examined. Although this approach is computationally intensive, it does result in a robust P value. We use 0.05 as the standard cut-off for statistical significance (Supplementary Table S3 online).

Fixation index
The fixation index F IS (often referred to as Wright's inbreeding coefficient) is used with genomic data on the distribution of alleles in population samples to identify gaps between the relative frequencies of homozygous or compound heterozygous individuals and those predicted on the basis of heterozygous individuals. 24 Although typically interpreted as a measure of consanguinity at the population level, it can also indicate potential quality problems in a genetic study due to issues such as selection bias or genotyping errors. A high F IS index can be interpreted as a measure of consanguinity or inbreeding only in a population with direct evidence of high levels of consanguinity as well as reliable data on admixture and population structure.
The following equation, defining the proportionate reduction in heterozygosity relative to HWE, was used to calculate F IS in each population survey included in this study (Supplementary Table S3 online). 25 (Equation 1): where p i is the proportion of allele i, and P ij is the observed frequency of genotype A A i j .

Newborn estimates at population level
To assess the influence of deviations from HWE on populationlevel estimates of newborns with Hb variants, we calculated the predicted number of HbSS newborns based on expected (HWE) and observed (HWE corrected using F IS ) allele frequencies for study samples or subsamples stratified based on locality, citizenship, or place of origin. Due to high heterogeneities in F IS within countries, national estimates would be misleading even for those few studies with samples from multiple localities within a country. As in previous studies, 7,26 demographic data included local population and national crude birth rate (CBR). Population data were extracted from one of the following sources, investigated according to the following sequence: UN Demographic Yearbook 2013 ( : where N i is the local population; CBR i is the crude birth rate; p i is the allele frequency of HbS; q i , the frequency of HbC, is equal to 1-p i ; and F IS i is the fixation index at place i.

Newborn screening surveys
We identified 36 published surveys of newborns tested for structural hemoglobin variants (HbS and HbC) in the study regions matching our inclusion criteria. These surveys included 60 samples or subsamples; 40, 5, and 15 of which were from surveys conducted in sub-Saharan Africa, North Africa, and the Middle East, Original research article respectively. Sample sizes ranged between 30 newborns in a survey conducted in Tanzania to more than half a million newborns tested over a decade in the United Arab Emirates; 57% of samples or subsamples had a size smaller than 1,000 newborns (Figure 1).
HbS allele frequencies within individual study samples ranged between 1.4% (Rwanda) and 14.6% (infants of Democratic Republic of the Congo (DRC) origin living in Burundi and DRC) in sub-Saharan Africa, between 0.3 and 2.8% in Tunisia, and between 0.5% (United Arab Emirates) and 13

Deviation from HWE
Although observed and expected genotype counts matched exactly in 16 study samples (27%), the observed number of HbSS individuals was higher than expected in 42 of them. The observed number of HbCC individuals was higher than expected in 29% (n = 8) of study samples in which HbC was found (Figure 2). A statistically significant difference (P < 0.05) between observed and expected HbS counts was found in 24 of 60 samples or subsamples: 1 in Burkina Faso; 6 out of 15 in the DRC; 1 in Gabon; 1 in Ghana; 1 out of 2 in Nigeria; 1 out of 4 in Senegal; 1 out of 5 in Tunisia; 1 out of 3 in Bahrain; 1 in Lebanon; 1 in Oman; 5 out of 5 in Saudi Arabia; and 3 out of 3 in the United Arab Emirates (Supplementary Table S3 online). Of 29 samples or subsamples with more than 1,000 newborns tested, 20 (69%) revealed differences between observed and expected frequencies of homozygotes that were statistically significant.  Table S3 online). For most countries, substantial differences in F IS

Original research article
were observed between various national population samples (Supplementary Table S3 online). For example, F IS ranged from −11.1 to 2.4% in Burundi, −5.7 to 6.4% in Tanzania, and from −0.9 to 21.2% in Bahrain.

Impact on estimates of prevalence of affected newborns
Because most F IS values were positive for the samples surveyed, estimates of the prevalence of HbSS in newborns corrected for the fixation index (i.e., based on Equation 3) were higher than Surveys for which only data on HbS were presented are not shown. To maximize the visualization of the differences, each axis is scaled based on the maximum of the observed or expected counts for each survey independently. n = number of newborns tested. *A population sample for which deviation from Hardy-Weinberg equilibrium was found to be statistically significant using the likelihood ratio full-enumeration exact test.   the uncorrected estimates calculated using the simple HWE equation. In three large (n > 30,000) studies from sub-Saharan Africa, the corrected to uncorrected HbSS prevalence estimate ratios were approximately 1.4 in the DRC and Ghana and 1.0 in Angola ( Table 1). In the Middle East, the ratios were considerably higher in screening studies conducted in Lebanon (11.4) and the United Arab Emirates (up to 17.7), whereas in Bahrain and Saudi Arabia ratios have fallen from 3.4 or more during the 1980s to 1.0-1.5 in studies conducted more recently. Across all surveys, the estimated total numbers of HbSS newborns, corrected based on F IS , were 33,261 instead of 24,958 in sub-Saharan Africa and 1,109 instead of 578 in the Middle East. Corrected and uncorrected estimates of annual births with HbCC and HbSC are presented in Supplementary Table S4 online for the 20 samples for which the frequency of HbC was reported and not null. Although most of these estimates need to be considered with caution due to the relatively small number of births affected by HbCC or HbSC (compared to HbSS), they suggest that uncorrected estimates based on HbC alleles assuming HWE could substantially underestimate the number of HbCC births and slightly overestimate the number of HbSC births.

DIsCUssION
This study, which to our knowledge includes all relevant published studies across the study regions, reveals a striking lack of newborn screening surveys in the areas most affected by sickle cell disorders. Universal newborn screening programs for sickle cell disorders have long been implemented in the United States and the United Kingdom, where it has led to substantial improvements in childhood mortality. 27,28 More generally, universal newborn screening can contribute to the reduction of health disparities. 29 Ghana is the only African country so far that has implemented a large-scale newborn screening program, with data published from Kumasi and a neighboring community. Local screening efforts have begun in a number of African countries, but those often lack financial and political support to be scaled up to subnational or national levels. 30 This is reflected by the rather small sample sizes of most newborn screening studies conducted in sub-Saharan African countries, with 30 of 40 samples or sub-samples (75%) having tested less than 1,000 newborns and only three studies reporting results for more than 30,000. With the ongoing increase of the health burden of hemoglobinopathies, the implementation of largescale newborn screening programs could contribute to saving millions of lives in the coming decades. 4 In countries for which several newborn screening studies have been published, the present study highlights heterogeneities in the allele frequencies of hemoglobin variants and the magnitude of deviation from HW expectations. Although the former can to some extent be taken into account using geostatistical methods, 26 to date the latter has been underappreciated in the calculation of national, regional, and global burden estimates.
Differences between observed and expected frequencies of genetic disorders based on allele frequencies within a study sample are common in newborn screening surveys of structural hemoglobin variants in Africa and the Middle East. Various factors can explain an excess of homozygosity, including consanguinity and population structure, but their respective effects are difficult to dissect. These results give insight into the relative magnitudes by which current national and regional estimates based on extrapolations from the frequencies of HbS alleles might represent underestimates of the birth prevalence of sickle cell disease. The magnitude of underestimation may be as much as one-third in African populations and almost onehalf in Middle Eastern surveys. Regrettably, we did not find substantial data from screening studies conducted in India and therefore could not extrapolate this conclusion to that country.
Surprisingly high F IS values found in small study samples from Gabon and Senegal suggest either quality control issues or chance variation. The threats to quality and potential bias in estimates of F IS include small sample size, selection bias, and mistaken genotyping. The Gabon study illustrates the first problem, with fewer than 100 newborns tested, a sample size that is inadequate for reliable estimates to be made. Despite a larger sample size (n = 479), the high F IS coefficient found in one sample of Senegalese newborns should be interpreted with caution because the reported frequency of SS (1.9%) is several times higher than reported in all other studies from Senegal (0.3-0.5%). This finding could potentially be the result of referral or selection bias or due to diagnostic errors.
An apparent excess of homozygotes is often attributed to consanguinity, which is assumed to be high in populations living in the Sahel, other parts of West Africa, the Middle East, and Central/South Asia. 31 Nevertheless, data availability on consanguinity are very patchy. 31 Existing national estimates of consanguinity may reflect the behavior of specific localized ethnic groups studied, rather than that of the overall population of a country. In much of sub-Saharan Africa, however, consanguinity is reported to be widespread and may be found in similar frequencies among different ethnic groups in a country. 32 Moreover, estimates of the frequency of consanguineous marriage may differ widely for a single ethnic group. For example, two published estimates of consanguinity among the Yoruba living in southwestern Nigeria were 51% in a rural sample in the 1970s and 6% in an urban sample in the 1990s. 33 Consanguinity is generally thought to be common in many parts of the Middle East. 31 For example, well over one-half of marriages among Saudi nationals in Saudi Arabia are consanguineous, with one-third of all marriages between first cousins. 34 As illustrated by Figure 2, we found a marked excess of SS homozygotes in most of the earlier surveys conducted in Bahrain, Oman, Saudi Arabia, and the United Arab Emirates.
In Bahrain, data have shown a reduction over time of the excess homozygosity, an observation that may be accounted for by more recent introduction of widespread carrier testing. The excess of SS homozygotes observed in 1984-1985 had essentially disappeared in screening data collected in 2002 and 2008-2010. The introduction of prenatal carrier screening, which began in 1993, student screening, which began in 1998, and voluntary premarital screening and counseling of couples identified as carriers (mandatory for Bahraini citizens beginning in 2004) appear to have driven this trend. Between 1984-1985 and 2010, the prevalence of HbSS in Bahraini newborns decreased from 2.1% to 0.4%, whereas that of HbAS increased from 11.2 to 14.7%. The public health implication is that the direct influence of parental consanguinity on high rates of sickle cell disease relative to the prevalence of the sickle cell allele can be attenuated if carrier testing leads to fewer carrier couples marrying and having children together. In various Mediterranean countries, programs combining premarital screening, genetic counseling, and prenatal diagnosis for carrier couples offered during early pregnancy long ago resulted in 90% or more decreases in live births with thalassemia. More recently, a similar program in parts of coastal Turkey resulted in similar decreases in births for both thalassemia and sickle cell disease. 35 In the Middle East, attitudes toward prenatal diagnosis and abortion are greatly influenced by religious values. Education about a fatwa, which allows the abortion of a diseased fetus within the first 120 days of a pregnancy, can promote the acceptance of genetic counseling in Islamic societies. 36 Ethnographic or genealogical approaches to assessing consanguinity often yield results that differ markedly from genomebased measures of supposed interbreeding. 33 Differences in the presumed frequency of inbreeding suggested by the F IS coefficient often reflect the influences of genetic isolation and genetic drift, including past population bottlenecks and founder effects rather than consanguineous marriage. 37 Differences in the F IS coefficient may also reflect differences in study quality, including mistaken attribution of genotype due to testing errors and selection bias. A study that analyzed data from 26 populations around the world with both F IS coefficients and estimates of consanguinity found that the latter explained a little more than one-tenth of variation in the former (r = 0.349, P = 0.040). 37 One of the highest ethnographic estimates of consanguineous marriage in Nigeria (56-61%) comes from rural Hausa communities in the northern part of the country. 38 Although F IS coefficients are moderately correlated with ethnographic estimates of marital consanguinity, 37 it has not been established that the same association holds true among studies in sub-Saharan Africa.
An excess of homozygotes or deficit of heterozygotes can also be explained by population genetic structure or stratification, also referred to as the Wahlund effect. 39 Population stratification can result from random mating within sub-populations of distinct ancestral origins. In population stratification, each subpopulation may be characterized by HWE, but an aggregate sample from the stratified population can show deviation from HWE and apparent inbreeding or assortative mating. Population stratification is particularly likely to occur in countries or regions in which hemoglobinopathies were historically rare but now have distinct high-frequency subpopulations due to historical or recent population migrations. 40 In particular, population stratification is present in heterogeneous populations in North America and Western Europe in which most cases of sickle cell disease occur among specific subpopulations with a migration history or ancestry from areas in which the HbS allele has been present for millennia. In African and Middle Eastern countries with heterogeneous allele frequencies of hemoglobin variants, population stratification can be expected in large cities to which people migrate. This is illustrated by Tshilolo's 2009 study of newborns in Kinshasa in which HbS allele frequencies varied between 4.3 and 11.4% in different subpopulations, and the F IS ranged from −4.5 to 6.9. Although screening studies usually provide little information about the actual structure of local populations, the inclusion in future analyses of data from national censuses and/or genome-wide association studies 41 could potentially contribute to a better understanding of the factors underlying HWE deviations observed and to improving current and future newborn estimates.
Finally, the sensitivity and specificity of the diagnostic methods used as well as the main objectives of screening surveys represent another potential source of deviation. Most newborn screening programs attempt to minimize the chances of false positives and false negatives for individuals with sickle cell anemia, leading to strict diagnosis confirmation and minimal errors; however, the limited clinical implications of a misdiagnosis of a heterozygote carrier as a wild-type homozygote and vice versa might not justify the expenses of further confirmation, leading to more frequent misclassification of AS and AA than SS. Although we recorded the diagnostic method used in each survey, further work is needed to identify the precise role of this parameter on deviations from HWE.
In conclusion, taking potential deviation from HWE into account in assessments of the prevalence of hemoglobinopathies could have several benefits. First, it would help refine existing estimates of numbers of SCA-affected newborns at national, regional, and global levels. Owing mostly to the limited availability of data on consanguinity, the most recent published global estimates of the prevalence of HbSS 4 presumed HWE. That could have led to a substantial underestimation of the number of HbSS annual births in some countries, as suggested by the present study. Second, the refined model presented here allows for improved estimation of numbers of SCD-affected newborns, including both HbSS and HbSC, based on allele frequencies, although other SCD variants are still excluded. Third, the prevention of these disorders could be improved by the adoption of appropriate policy measures tailored to the genetic, social, and cultural factors responsible for HWE deviation. Finally, further analyses of the respective contributions of various factors responsible for deviations from HWE might reveal interesting insights into the evolutionary dynamics of these genes in different populations across the globe.

SUPPLEMENTARY MATERIAL
Supplementary material is linked to the online version of the paper at http://www.nature.com/gim data about the surveys described in Diallo, 2008. F.B.P. has been partly supported by an advanced grant (Diversity) from the European Research Council. TNW is funded by a senior clinical fellowship from the Wellcome Trust (091758). This report was submitted with permission of the Director of the Kenya Medical Research Institute. The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the United States Centers for Disease Control and Prevention.