INTRODUCTION

Dengue is one of the most important arboviral infections because of its global distribution, its prevalence, and the gravity of its most severe presentation, dengue hemorrhagic fever (DHF). Most infections are asymptomatic,1, 2 but 100 million each year have dengue fever (DF), which frequently presents as an acute, weeklong febrile illness with severe joint pains that is followed by a month of generalized weakness and lack of well being. An additional 1% of those symptomatic with dengue virus (DENV) infection will develop DHF, where there is spontaneous bleeding and at times shock and death. Most cases of DHF occur in individuals who have had earlier infection with one or more of the four viral serotypes or who have non-neutralizing maternal antibodies.1 In many endemic areas, such as Brazil, the majority of individuals have had multiple infections,2 yet, only 2–4% of those with secondary infections will develop DHF.3 Additional host or viral factors, therefore, must be involved.

Although there may be additional epidemiological associations between DHF, such as age, some chronic diseases, and nutritional status4, a contribution from host genetics to the risk for DHF is suggested by ethnicity-based differences in disease rates.5, 6 Populations of African origin7, 8 as well as other ethnic groups5, 7, 8, 9 seem to be relatively resistant. Genetic association studies have also pointed associations with multiple HLA loci10 and, most recently, polymorphisms in the gene encoding CD209 (DC-SIGN).11

In 2002, the murine flav locus, a locus responsible for resistance to all flaviviruses in mice, was identified as the OAS1b gene.12, 13 As this gene in mice differs substantially from its homolog in human beings, we generalized this to be the evidence that there are elements in the type 1 interferon (IFN) response pathway that can significantly influence the clinical presentation of dengue. We, therefore, chose to study a range of genetic candidates, in particular those of the IFNα response pathway, while controlling for important non-genetic variables also associated with DHF.

MATERIALS AND METHODS

Recruitment

During Brazil's most active dengue transmission year to date (2002–2003), the State Secretariat of Health of Bahia identified 91 potential cases of DHF by active surveillance in the hospitals of the city of Salvador. Of these, 65 met the initial study criteria for suspected cases, that is a dengue-like syndrome, spontaneous bleeding, and thrombocytopenia. Using World Health Organization, population-based criteria6 31 had evidence of a capillary leak syndrome as defined by Brazilian Ministry of Health norms for hemoconcentration (http://dtr2001.saude.gov.br/svs/pub/GBDIP/guia_bolso_4ed.pdf) and a hemagglutination inhibition (HI) serology consistent with dengue infection. HI is not specific among flaviviruses, and the serology was not conducted in the acute phase, thus these cases represent hospitalized probable DHF cases. An additional 34 met all criteria, but did not have data for determination of hemoconcentration. These were classified as hospitalized possible DHF. All of the cases were consistent with DHF grade II with spontaneous bleeding, but no evidence of shock or circulatory failure (Figure 1). There were no reported deaths in Salvador because of dengue in 2002. In 2004, 55 of the 65 probable or possible of DHF cases were contacted at home and agreed to enter the study. Five of these were excluded, as they were seronegative for DENV, and this left 20 possible and 30 probable hospitalized cases of DHF for analysis. For each DHF case, at least four neighbors who had symptoms of dengue in 2002 or 2003 and four who claimed never to have had symptoms were also enrolled. Symptomatic cases conformed to the 1997 WHO classification of probable dengue febrile illness (DF). These controls were all identified within 1 km of their index case and generally were within 100 m.

Figure 1
figure 1

Recruitment. Potential DHF cases – identified by the Bahia State Secretariate of Health from febrile hospitalized patients during the 2002 and 2003 dengue transmission season based on hospitalization and severity of disease. Excluded – failed to show documented clinical criteria: spontaneous bleeding and thrombocytopenia (platelets <100 000) plus one or more of the following: headache, retro-orbital pain, muscle pain, joint pain, rash, and leukopenia. Possible DHF – hospitalized patients with all of the clinical criteria, but no documentation of hemoconcentration by Brazilian Ministry of Health hematocrit criteria: children up to 12 years >38%, women >40%, men >45%. Probable DHF – all of the WHO criteria for DHF documented. Probable DF – neighbors of suspected or confirmed DHF who claimed to have symptoms of dengue (see excluded) during January–May of 2002 or 2003. Possible Asymp (asymptomatic) – neighbors of suspected or confirmed DHF who claimed never to have had symptoms consistent with DF or DHF. Seronegative – IgG hemagglutination inhibition titer <1:20. Genotyped – all suspected and confirmed DHF cases enrolled were genotyped. Some enrolled probable DF and Asymp controls selected at random were not genotype because of limitations in genotyping capacity.

Relatives and household members of cases or controls were excluded. The controls were matched for age either less than or greater than 15, but not for sex, as there is usually little or no difference for DHF.5, 14 In addition to the interview for demographic, socioeconomic, and environmental data, blood was collected for genomic DNA and dengue serology. Written informed consent was obtained from all participants and/or their guardians. The study was approved by the ethical review boards of University Hospitals of Cleveland, the Oswaldo Cruz Foundation, Bahia, and the National Commission on Ethics in Research (CONEP), Brazilian Ministry of Health, and followed the guidelines of the US Department of Health and Human Services.

Socioeconomic and environmental variables

During a home visit, trained interviewers recorded answers on a standardized form to questions about schooling, household income, household appliances, water sources, water storage, sewage disposal, and home environment. Self-reported ethnicity was solicited according to the classification (white, mixed, black, Indian/Asian) recognized by the Brazilian Census (Instituto Brasileiro de Geografia e Estatística-IBGE). Unlinked markers were used to estimate percent genetic ancestry with the software program Structure 2.115 under the admixture model.16 ‘Income’ was reported in units of monthly minimum salaries (1 MMS ≈ US$97.03 in 2004) as set by the Brazilian government. Income was significantly correlated with variables related to formal education and disposable income by linear regression analysis (in decreasing order of significance – computer, water filter, years of schooling, internet, open water containers, washing machine, freezer, household planters, planters observed in the household, cell phone). The reported income by ethnic categories also paralleled that reported to the Brazilian census for the city of Salvador (Supplemental Figure 1). Income was not recorded for 13% (n=83) of those interviewed, so an income index was created using the sum of coefficients from linear regression analysis of demographic and environmental variables with income as the dependent variable.16

Serology, genomic DNA isolation, and quantitation

All plasma samples were screened at 1:20 (initial dilution) at the Evandro Chagas Institute (a WHO Collaborating Center for Arbovirus Reference and Research) in Belém, Pará, Brazil using the HI test as modified by Shope and Sather.17 When the response to a dengue serotype was four times greater than other flaviviruses or ≥1280, the HI response was considered positive. All serum samples were also tested by IgG ELISA for dengue, yellow fever, and other flaviviruses circulating in Brazil.

Genomic DNA was extracted from 2 ml of buffy coat by QIAamp Flexigen kit according to the manufacturer's instructions and quantified by spectrophotometer. All samples were diluted to a final concentration of 100 ng/μl before genotyping.

Single-nucleotide polymorphisms selection and genotyping

We genotyped 728 single-nucleotide polymorphisms (SNPs) in 56 important genes of the type 1 IFN response pathway or with potential for association with dengue severity (Supplemental Table 1). Genes were selected for their function in viral sensing, signal receptors, signal transduction, effector or suppressor activity, or involvement suggested by the literature. We sought to identify 20 markers in each gene based on markers at the SNPper website (http://snpper.chip.org/) in 2006. Markers separated by at least 60 bases and within a region from 500 bases upstream of the translational start to the polyadenylation signal were targeted. SNPs with earlier validation on dbSNP at the NCBI website (http://www.ncbi.nlm.nih.gov/) were preferred. HapMap data were not available at this time. If the minor allele frequency was available, we considered those with minor allele frequency >0.05. We also included SNPs located in the promoter, intron–exon boundaries, published regulatory motifs in the promoter, and 3′ UTR,10 as well as SNPs with known functional significance or genetic association. To aid ancestry estimates, 40 ancestry informative markers18 were selected for genotyping.

For genotyping, 250 ng of DNA was arrayed in 96-well plates and processed using Illumina's (Illumina, CA, USA) microbead array technology.19 Genotypes of six replicates and two trios were used to guide the clustering and calculate error rates. The trios consisted of multi-ethnic families from Salvador not identified through their dengue status and composed of father, mother, and son. Two analysts interpreted the genotyping results using GenCall software v. 6.2.0.4 (Illumina). As recommended by Illumina, the limit for low GenCall Score was set at 0.25, and loci with lower scores were excluded.

Data analysis

Cases of hospitalized possible DHF and hospitalized probable DHF were analyzed separately and combined. Although those reporting never experiencing dengue are likely to have had milder symptoms, some may have been merely stoic or not very knowledgeable about the disease. Therefore, the asymptomatic group was also analyzed separately. Genotyping error rates were calculated on inconsistencies for six replicates per plate, on male hemizygosity for X-linked markers, and on offspring genotypes inconsistent with parental genotypes for the trios. Linkage disequilibrium (LD) was calculated with the program Haploview.20 The Bonferroni's correction for multiple tests was calculated for all independent SNPs. Independent SNPs were considered as those not located within haplotype blocks as defined in Haploview along with SNPs selected at random from within the haplotype blocks (n=305). In addition, the q-value, a measure of statistical significance for the false discovery rate (FDR),21 was calculated using the program Qvalue.22

A sliding window approach, implemented in the DECIPHER package of S.A.G.E. 5.4.2,23 was used to analyze the association with contiguous haplotypes. Haplotype windows or sets were formed from the first SNP in a selected region and the next n–1 SNPs, where n indicates the size of the combined SNPs or window size. The analysis was repeated with a haplotype set consisting of the second SNP and the next n–1 SNPs in the region. The process was continued until the window consisted of the last n markers in the region. Haplotype frequencies were estimated using the expectation-maximization algorithm, and a likelihood ratio test (LRT) was used to compare the distribution of haplotypes between groups. The likelihood ratio asymptotically follows a χ2 distribution, and this distribution is conservative when there are rare haplotypes, thus an empirical P-value for the LRT statistic was also computed. Alleles in a single gene showing significant association with DHF were also grouped as a haplotype and analyzed using logistic regression. Genotypic and haplotypic analyses were controlled for sex, age, ancestry, and income.

The association between disease severity and other variables was tested using multivariable logistic regression with the program SPSS v. 10 (SPSS Inc., IL, USA). Each marker in the JAK 1 gene was tested for associations with DHF controlling for sex, age, proportion of African genetic ancestry, and an index of income. The percent African ancestry was earlier estimated for each individual16 using the program Structure,15 and this proportion was used as an independent variable in the regression analysis. Genotypes for the markers associated with DHF were tested using dominant, recessive, and codominant models.

RESULTS

Population and genotyping

From active hospital surveillance in Salvador, Bahia, Brazil, 55 of those with hospitalized/possible DHF or hospitalized/probable DHF in 2002 were contacted and agreed to participate in the study. Five (8%) of those with hospitalized/possible DHF or hospitalized/probable DHF were eliminated because they were seronegative for dengue as compared with 9 and 12% for the outpatient DF and asymptomatic groups, respectively. Although it was not possible to determine the infecting serotype, 80% of the viral recoveries at the State's Central Laboratory were DENV-3 in 2002 (P Melo, personal communication, master's thesis). The DF households had significantly lower income than any of the other groups. Otherwise, there were no statistically significant differences for age or sex between the clinical categories of dengue. Although the hospitalized/possible DHF group were younger, they did not otherwise differ from the confirmed cases for income, ancestry (Table 1) or allele frequencies for associated markers (not shown).

Table 1 Epidemiologic associations with dengue clinical presentation

Genotyping was successful for 93.5% of the markers and 98.8% of the samples. The two analysts’ genotype calls were highly correlated (r=0.99±0.03). For successful markers, 1% of genotypes failed on average, and 83% were polymorphic (minor allele frequency >0.05). In all, 77.6% (593/768) of the markers were genotyped successfully and were polymorphic. The mean and median minor allele frequencies were 0.269 and 0.265, respectively. HW proportions were observed for >98% of the polymorphic markers. Those that failed were examined for potential genotyping errors and did not show a significant association with DHF.

The genotyping error rate was estimated from analysis of six replicates and two trios. In addition, 25 X-linked markers were used to check hemizygosity in males, and the one heterozygous male sample was excluded. The error rate by replicate inconsistencies was 0.13%, by sexual inconsistencies 0.01, and by Mendelian inconsistencies 0.07%. Seven polymorphic SNPs were also genotyped using a primer extension method (DNAprint, Sarasota, FL, USA) for 65 samples and no discordant genotypes were identified.

Single locus and genotype analysis

Comparing the hospitalized possible/probable DHF cases with the outpatient DF cases, there were 58 markers with P<0.05 out of 593 informative markers. The gene with the largest percentage of significant markers was JAK1. There were 11 out of 18 SNPs in JAK1 with P<0.05, and 6 of these had the most significant values overall (Table 2). Five markers in JAK1 (rs11208534, rs2780831, rs310196, rs310222, and rs310216) were significantly associated with DHF (Table 2) using a conservative cutoff of q<0.2 for the FDR and three markers nearly reached significance using the Bonferroni's correction (1.6 × 10−4).

Table 2 Rank order of single-nucleotide polymorphism associations chromosome 1

The effect size of JAK1 polymorphisms on DHF risk was assessed by logistic regression of each locus for DHF compared with DF and controlled for age, sex, income, and ancestry. Increased risk was associated with homozygous genotypes for rs11208534 (TT) and rs2780831 (GG), whereas the TT genotype was protective for rs310196 (Table 3). Risk for the combined hospitalized/probable and possible DHF compared with outpatient DF was associated only with the TT genotype for marker rs11208534 (odds ratio (OR): 5.19, confidence interval (CI): 2.13–12.66), GG for rs2780831 (OR: 2.61, CI: 1.40–4.90), whereas the homozygous TT genotype was protective for rs310196 (OR: 0.30, CI: 0.15–0.57). In the stratified analysis, the hospitalized possible DHF cases had ORs similar to the probable cases. The same was true for the comparison of DHF with asymptomatic controls; however, the strength of association was weaker for this comparison and only reached significance when the probable and possible DHF cases were combined (not shown). The comparison of the combined possible asymptomatic and DF individuals with the combined DHF cases produced no significant associations.

Table 3 Genotype risk for DHFa

The P-values and ORs for the association DHF with DF were mapped to the JAK1 intron–exon structure (Figure 2). The most significant SNPs as well as those with the greatest impact on risk for DHF were located toward the 5′ end of the JAK1 gene. Although the P-values for the three principal SNPs in JAK1 were similar, the OR for rs11208534 (3.9) was the largest.

Figure 2
figure 2

Distribution of JAK1 P-values and ORs. The P-values were converted to –log10 p and OR values were transformed by inversion in which necessary to ≥1. Exon–intron organization based on map structure in Genbank (Build 36.3) is shown below.

The LD structure of JAK1 for this sample and marker set was composed of four LD blocks and four independent markers. The SNPs rs11208534 and rs2780831 were located in the same linkage block and rs310196 was an independent marker (D′ rs2780831/rs11208534=1, rs2780831/rs310196=0.924, and rs11208534/rs310196=0.856).

Haplotype analysis

Haplotype analysis was performed using a sliding window approach in which SNPs were grouped 2, 3, 4, and 5 at a time. We found that the P-value of the JAK1 markers declined with windows >2. However, associations for markers in RNaseL and IKBKE increased with increasing window size compared with SNP analysis (Figure 3).

Figure 3
figure 3

Sliding window haplotype analysis on chromosome 1. SNPs were sequentially grouped in windows of 2, 3, 4, and 5 and tested for association with DHF. The results for 1, 2, and 5 SNP windows are shown. The y axis shows P-values transformed to –log10 p. The x axis provides the relative map order and distance along chromosome 1 for each marker. The order of each candidate gene is shown at the top of the graph.

In another approach, the high-risk homozygotes for the three most significant SNPs (rs11208534, rs2780831, and rs310196) were combined into a single haplotype and compared for association with DHF (Table 4). As rs11208534 and rs2780831 were in strong LD, we also evaluated the two independent SNPs separately. The haplotypes produced smaller P-values and more narrow CIs (Table 3). When controlled for age, sex, ancestry, and income, logistic regression for combined DHF cases gave P=0.0025, OR: 2.7 (95% CI: 1.4–5.3) for the haplotype consisting of three SNPs, and P=0.0023, OR: 2.9 (95% CI: 1.5 – 5.8) for the two SNP haplotype.

Table 4 Genotypic associations with ancestry

Effect of population stratification and covariates

Of all the demographic variables tested, only African ancestry and income entered the regression model. Therefore, both the genotypic and haplotypic regression analyses were controlled for these variables as well as for age and sex. The three markers most associated with DHF (rs310196, rs2780831, and rs11208534) each showed significant associations with African ancestry and income in this population (Supplemental Table 2). Only rs11208534 and rs310196 were also genotyped in the HapMap populations, and these also showed significant differences between the European and African populations for these markers (minor allele frequencies 0.12 vs 0.23 and 0.12 vs 0.50, respectively, P<0.01). In the Salvador, for rs11208534, self-identified whites had a minor allele frequency of 0.13 compared with 0.22 for blacks. For rs310196, the frequencies were 0.20 and 0.37, respectively, P<0.05. The proportion of African ancestry was significantly lower for the susceptible genotypes, consistent with the association of African ancestry with protection from DHF in this population.16 There was a weaker association between income and each of the genotypes. There were no statistically significant differences for sex, income, or genotypes in JAK1 between the probable and possible cases, except for one SNP not associated with DHF. This indicates that these two groups are epidemiologically and genetically similar for these important variables and justifies combining them.

CD209 and DHF in Salvador, Brazil

We genotyped 11 markers in the CD209 gene including the promoter polymorphism most strongly implicated in an earlier study (rs4804803, a.k.a. DC-SIGN1 −33618), but we found no association with DHF in this Brazilian population (Supplemental Table 3).

DISCUSSION

The study of dengue in the Americas is a unique opportunity to witness the sequential introduction of a ‘new’ pathogen into a naive population. DENV-2 was introduced to the city of Salvador in 1995. This was followed by the introduction of DENV-1 and the co-circulation of these viruses until by 2002 was 70% of the population was seropositive for at least one serotype.2 The year 2002 saw the introduction of DENV-3 and the city's first cases of DHF. These were the cases collected for this study. Typical for most endemic areas, the majority of those infected did not experience DHF despite a high percentage of secondary infections. So, individual host genetics may influence the risk for developing DHF.

In this study, the strongest genetic association with DHF was found for polymorphisms located at the 5′ end of the JAK1 gene. Identification of JAK1 as an important gene in the clinical presentation of dengue is consistent with an emerging picture from diverse lines of investigation. Mouse studies have shown that some components of the type 1 IFN response pathway can control the response to all flaviviruses.12, 13 In addition, flaviviruses have developed specific strategies to neutralize signaling downstream of the receptor to circumvent this response.24, 25 Supportive evidence is also provided by a global analysis of expression profiles in individuals with dengue shock syndrome compared with DF, which indicated that multiple IFNα-regulated genes were under-expressed.26 The effect on multiple genes at the end of a pathway may suggest suppression of an early step in signaling. As JAK1 is one of the two signaling proteins associated with the type I IFN receptor (the other is TKY2), regulatory polymorphisms in this gene could produce the pattern of downstream under-expression seen in severe dengue. JAK1 is also a component of many other signaling pathways, but the accumulation of evidence from other studies12, 13, 24, 25, 26 suggests that its importance here is as a component of the IFNα/β response pathway.

The association of the identified JAK1 SNPs with DHF is also consistent with the epidemiology of this condition in the western hemisphere. The decrease in the susceptibility of allele frequency for those with greater proportions of African ancestry follows the epidemiologic and immunologic differences observed in the distribution of DHF.3, 27, 28, 29

We did not observe any association between DHF and the CD209 locus in this population (Supplemental Table 3), despite an earlier study showing ORs >14 for an SNP (rs4804803) in this gene.11 The failure to show an association with this gene and SNP may be due to population-based genetic heterogeneity between Brazilians and Thais. There are also important differences in the clinical presentation of the DF patients for the two studies. All of the Thais with clinically apparent dengue were hospitalized and were classified as severe dengue, whereas the Brazilians with probable DF in this study were outpatients and unlikely to be classified as severe. As the study was conducted retrospectively, acute sera were not available for most of those enrolled. Thus, our DF patients correspond to probable cases by WHO criteria, unlike the hospitalized DF cases in the Thai study in which acute serologic profiles were obtained. Finally, the study in Thailand also enrolled children exclusively, whereas we primarily enrolled adults, thus, age-specific differences and disease severity may also explain the failure to see an association with CD209 in our population. These enrollment characteristics for the two studies were driven by the differential age distribution of DHF between these two regions. Although DHF cases in Latin America occurred predominantly in adults in 2004, in Southeast Asia, they occurred most frequently in children.30

As this was a retrospective study, we were unable to distinguish primary and secondary infections. Whether this poses a problem in classification for genetic studies is unclear. In the small number of studies in which the analysis was stratified in this way, most found no difference for the loci examined, except in the largest HLA studies.11, 31, 32, 33, 34 There may be genetic differences based on earlier immune status, but apparently not for all associations. A retrospective study may also suffer more from misclassification, as the IgG serology is less specific than the IgM serology obtained in the convalescent period. The clinical classifications should thus be considered provisional.

The strongest association between JAK1 and DHF was observed when those cases with DHF were contrasted with the DF cases, rather than the asymptomatic, but seropositive controls. A similar pattern of association was observed in the study in Thailand for CD209. This may suggest additional genetic loci or environmental factors, such as viral inoculum or nutrition might have a function in addition to the risk genotypes. The strong influence of socioeconomic factors also suggests a function for environmental factors or may simply correlate with the usage of or access to medical services.

The location of the associated polymorphism indicates either that there is a regulatory element in one of the first three introns or that these SNPs are in LD with an as yet unidentified polymorphism in an exon. At present there has been only one SNP identified in an exon of this region of JAK1, rs35237903, which is 3730 bp downstream of the SNP with the strongest signal, rs11208534. Clearly, resequencing efforts should be directed to the 5′ region of the gene. Studies to replicate the findings reported here are underway in a region of Brazil with a different population profile as well as laboratory-based investigations of JAK1 and DENV interactions.