Main

The scourge of AIDS, discovered three decades ago, is the most devastating epidemic of our time. The cumulative death toll is approaching 25 million, and the World Health Organization estimates that 42 million people today are infected with human immunodeficiency virus (HIV), a virus that, left untreated, kills 90% of its victims. Although considerable progress has occurred in the development of therapy for AIDS, the 16 available antiretroviral drugs are not always effective, are frequently toxic and do not clear virus from sequestered tissue reservoirs where HIV proviral genomes capable of rebound can linger for decades1,2,3. The strong drugs that slow the progress of AIDS are only now being distributed in the developing world, where they are most needed. HIV's replication efficiency, broad tissue dissemination and mutational plasticity in infected people has made vaccine development a considerable and still unsolved challenge4,5,6. For Africa, where 70% of the world's HIV-positive individuals live and die, United Nations Secretary General Kofi Annan recently named HIV/AIDS “one of the main obstacles of development itself”7.

Although AIDS is not generally considered a genetic disease, the considerable heterogeneity in the epidemic is at least partially determined by variants in genes that moderate virus replication and immunity5,8,9. Not all people exposed to HIV-1 become infected, and those who do progress to AIDS-defining pathology at different time intervals. Rapid progressors succumb to AIDS in 1–5 years whereas long-term 'nonprogressors' avert AIDS for up to 20 years. There are multiple AIDS-defining conditions, including Kaposi's sarcoma, lymphoma, Mycobacterium tuberculosis, Pneumocystis carinii pneumonia (PCP) and cytomegalovirus infections. Infected individuals have heterogeneity in the strength of their innate, humoral and cell-mediated immune responses as well as differences in how they respond to antiretroviral treatment.

AIDS restriction genes

In the early 1980s, geneticists at the US National Cancer Institute's Laboratory of Genomic Diversity initiated a program to search the human genome for AIDS restriction genes (ARGs; human genes with polymorphic variants that influence the outcome of HIV-1 exposure or infection10,11). We began by enlisting collaboration with epidemiologists who were establishing prospective AIDS cohort studies. These included American people in groups at risk for exposure to HIV-1: gay men, hemophiliacs exposed to clotting factor contaminated with HIV-1 before the introduction of HIV-1 antibody blood screening in 1984 and intravenous drug users who shared needles in areas of high HIV-1 incidence (Table 1). Clinical, virological and immunological data were assessed twice yearly for each individual through the course of his or her infection. We transformed viably frozen lymphocytes from study participants with Epstein-Barr virus to produce lymphoblastoid cell lines, a renewable source of DNA for population-based DNA assessments. Almost 8,500 study participants comprise our present study population, 6,007 with lymphoblastoid cell lines and the balance with nonrenewable DNA specimens (Table 1).

Table 1 Natural history of AIDS cohorts

We selected candidate ARGs from the growing body of host factors nominated to be involved in HIV-1 pathogenesis (Fig. 1). We resequenced selected human genes and adjacent regulatory regions to detect common single-nucleotide polymorphism (SNP) variants to genotype in the cohort populations, and we examined case-control comparisons of SNP allele and genotype frequencies and genetic and linkage equilibrium tests to uncover genetic influences. For ARGs that influenced the rate of AIDS onset in individuals infected with HIV-1, we used Kaplan-Meyer survival curves and Cox proportionate hazard statistics to examine the effect of different genotypes on specific AIDS outcomes12,13,14.

Figure 1: Illustration of how HIV-1 enters cells, suggesting candidate ARGs for inspection.
figure 1

When HIV-1 infects a person, it seeks out tissue cell compartments where it can replicate, including lymph nodes, neural tissue, epithelium in gut or vagina, spleen, testes, kidneys and other organs. HIV's principal factories are macrophages, monocytes and T-lymphocytes, all of which carry the CD4 cell surface protein. HIV-1 enters various cells by co-opting two receptor proteins on the cell surface92,93. The CD4 molecule acts as a docking station, which hooks the HIV surface envelope protein and leaves the virus hanging atop the cell. Then a large seven-transmembrane-spanning cell-surface receptor, CCR5, which floats around the fluid-like cell surface membrane, meanders into contact with the CD4-snagged HIV. The CCR5-HIV interaction causes the HIV gp41 protein to change shape and penetrate the cell membrane; the HIV particle then fuses with the cell membrane and infects the cell. Infected individuals produce one to ten billion virus particles each day throughout the course of infection. The median time to CD4 cell collapse and onset of AIDS-defining pathologies is 10 years. In most infected people in Europe and USA (where HIV-1 clade B is predominant), a mutational shift occurs in env (called R5 to X4 tropism), which alters coreceptor preference from CCR5 to CXCR4, usually coincident with CD4 depletion. The normal chemokine ligands specific for CCR5 (RANTES, MIP1α and Mip1β) and CXCR4 (SDF1) physically block CCR5 (R5) and CXCR4 (X4) HIV-1 infection, respectively, by covering their entry coreceptors6,23.

There were many rationales for identifying ARGs, but we anticipated three principal applications. First, determining the effects of ARGs in individuals with AIDS would connect laboratory-identified host factors to AIDS pathogenesis explicitly, particularly when a plausible functional or physiological mechanism was found. Second, because the powerful antiretroviral therapy is neither universally effective nor a true cure, genetic retardation of AIDS progression could implicate new cellular targets for anti-AIDS therapy that complement the currently available antiretroviral compounds15. Third, ongoing clinical trials for new vaccines and anti-AIDS drugs, like longitudinal cohort studies, show that these drugs elicit heterogeneous responses from people with AIDS. As some portion of this heterogeneity is attributable to the distribution of ARG genotypes in the study population, a quantitative ARG score, a 'genetic propensity index' (GPI) for individuals that could be used to adjust (or subtract) the genetic 'noise' in clinical trials based on each participant's ARG genotype16, would be useful.

Since the first ARG variant, CCR5 Δ32, which effectively blocks HIV-1 infection in homozygous people and slows AIDS progression in heterozygotes17,18,19, was reported, our group and others have documented thirteen additional ARGs through genetic association in these same cohort populations (Table 2). Their genetic influences differ in several ways, including mode (dominant, codominant, recessive), stage at which they act (infection, AIDS progression, the type of AIDS-defining pathogenic disease) and the interval of HIV-1 progression during which the influence is apparent. Strategies for discovering ARGs and evidence for genetic association have been reviewed9,14, as have the physiological mechanisms of ARG action8,16, variation in ARG frequency and influence among ethnic groups20,21 and replication in independent cohort populations8,14,22,23,24. ARGs are polygenic and multifactorial (i.e., they must interact with environmental factors, such as HIV-1 exposure, in order to be detected).

Table 2 Genes that limit AIDS

In this review, we explore the hallmarks of genetic association that indicated that these genes influence AIDS pathogenesis and the translational benefits of ARGs to therapy development. We discuss the quantitative influence of individual and composite ARG genotypes in predicting AIDS outcomes for a person with HIV and for a study population. We estimate the influence that known ARGs have on the epidemic and compute a new information theory–based epidemiological parameter, the 'explained fraction' (EF, Box 2) to quantify the overall influence of multiple ARGs on the rate at which a population progresses to AIDS (ref. 25 and G.W.N. & S.J.O., unpublished data). The EF offers a quantitative yardstick that estimates the proportion of epidemiological variance in AIDS progression (or in any complex multifactorial disease) contributed by one or more ARGs in the study population.

Validating ARGs in association studies

The ARGs in Table 2 were identified and validated by genetic association studies comparing alternative disease categories (e.g., infected with HIV-1 versus exposed to HIV-1 but not infected, or fast progressors versus slow progressors) for departure of allele, genotype or linkage equilibrium using the expectations of population genetic theory. Association analyses can be misleading when statistical significance alone is the basis for gene identification, and there are too many examples of artifactual associations in the literature from studies that are underpowered statistically, are inappropriately structured with respect to ancestral population admixing or contain sampling errors26,27,28. To avoid mistaken associations, several authors have proposed benchmarks or stringent criteria for validation28,29.

For ARGs, there are five common aspects of a credible genetic association: (i) High number of clinical cases, leading to low P values for significance after statistical correction for multiple tests. Power calculations suggest that more than 400 cases are adequate to detect a genotypic association with a relative risk (RR) ≥2.0. (ii) Independent replication of significant associations in different populations, in different ethic groups, in different risk groups, in different cohorts and by different laboratories or investigators. If ARG allele frequencies differ among ethnic groups, the analysis should be stratified to avoid population structure artifacts30,31. (iii) Epidemiological influences that have high RR, high relative hazard (RH) and high attributable risk (AR, Box 1). Low risks mean the gene is either relatively unimportant or a statistical fluke. (iv) Gene associations assessed alone, as well as with a statistical adjustment for other previously associated gene types. This latter consideration tests epidemiological independence of multiple ARGs and uncovers epistatic interaction of genes with related physiological roles. (v) Plausible functional explanations that connect the operative SNP to a quantitative difference in a gene product and relate the change to AIDS development. Although each of these five criteria were not fulfilled for every ARG in Table 2, most ARGs met most of criteria. Their functional connections to AIDS progression are summarized in the next section.

Mechanism of action for ARGs

AIDS is primarily a disease of the immune system caused by a virus evolved to infect, take over and debilitate the host's highly evolved innate and acquired immune defenses. Naturally occurring genetic variants in genes empirically connected to AIDS pathogenesis were discovered in the two decades of AIDS dissemination and study. The genetic epidemiological associations have been interpreted in the context of an implicated gene's role in development of AIDS, and in each case a plausible functional mechanism has led to varying levels of empirical support (Table 2).

The steps in establishing a chronic HIV-1 infection leading to AIDS are illustrated in Figure 1. Genes encoding cellular molecules involved in the process were candidate ARGs. CCR5 Δ32 is a naturally occurring knockout deletion (32 bp) variant. Homozygotes resistant to R5-HIV-1 infection (the principal infecting HIV-1 strain) lack the requisite HIV-1 entry coreceptor CCR5 on their lymphoid cells8,17. Heterozygotes express less than half the wild-type levels of CCR5 receptors, slowing HIV-1 replication, spread and pathogenesis32,33. The CCR5 P1 allele, a composite haplotype comprising 13 distinct SNP alleles in the upstream promoter region of CCR5, confers recessive more rapid progression to AIDS34,35. Analysis of CCR5 promoter alleles found no quantitative differences in luciferase transcription, HIV-1 binding or HIV-1 infectivity, although the alleles influence CCR5 abundance on lymphoid cells36,37. Further, oligonucleotides that define the CCR5 P1 allele showed allele-specific recognition of nuclear transcription factors belonging to the cREL family, a result that suggests, but does not prove, that CCR5 promoter alleles could differentially react and respond to transcription factors in different cell types37.

The CCR2 V64I variant mediates delay in AIDS progression38,39 indirectly, as it causes no allele-specific quantitative differences in the amount of CCR2 produced, in the ability of alternative allelic products to bind HIV-1 or in signal transduction with CCR2-specific ligands40,41. One report indicated that CCR2 I64 protein can preferentially dimerize with CXCR4 polypeptides (the HIV-1 receptor that replaces CCR5 as an entry receptor at later stages; Fig. 1) whereas the wild-type CCR2 peptides do not42. Though it is unconfirmed, this mechanism suggests that CCR2 V64I delays AIDS by limiting the transition from CCR5 to CXCR4 in infected individuals, a turning point in the collapse of the CD4–T lymphocyte cell population and a prelude to AIDS-defining disease23.

RANTES is a principal chemokine ligand for CCR5. levated circulating RANTES levels have been detected in exposed individuals who avoid infection and also in people infected with HIV-1 who have a delayed onset of AIDS43,44. One of the seven SNP variants found in CCL5 (encoding RANTES), In1.1C, is nested in an intronic regulatory sequence element that has differential allele affinity to nuclear binding proteins and whose transcription is downregulated by a factor of 4 (ref. 45). A reduction in RANTES production in individuals carrying this allele leads to rapid AIDS progression, ostensibly by uncovering available CCR5 that facilitates the replication and spread of HIV-1.

The CXCL12 3′A variant is a G→A transition in the 3′ untranslated region of one of two alternatively spliced transcripts of CXCL12 (also called SDF1; ref. 46). Stromal-derived factor (SDF1) is the primary ligand for the late-stage HIV-1 receptor CXCR4, and the 3′A variant is 37 bp downstream of two blocks whose sequences are 88% and 92% conserved, respectively, in mouse and human transcripts46,47. The relatively high sequence homology is a signal for selective constraints on mutational divergence of the region because it is a putative recognition sequence for RNA- (or DNA-) binding regulatory factors. Although allele-specific cellular or virological differences have not been explicitly established, the delay in onset of AIDS observed in CXCL12 3′A homozygotes might result from overproduction of SDF1 in certain tissue compartments, postponing the CCR5-CXCR4 transition. A synergistic protective effect in individuals carrying CXCL12 3′A and CCR2 I64 would be consistent with this model46.

Quantitative functional allelic distinction has not been confirmed for two ARGs: CXCR6 E3K, which is associated with late-state progression in individuals with AIDS and Pneumocystis carinii pneumonia, and haplotypes in the chromosome 17q gene cluster that contain three SNP variants around the chemokine genes CCL2, CCL7 and CCL11 (refs. 48,49). MCP1 (encoded by CCL2) and MCP3 (encoded by CCL7) are requisite ligands for CCR2 and CCR3, respectively, and eotaxin (encoded by CCL11) is a ligand for CCR3, a chemokine receptor implicated in HIV-1 infection of microglial cells in the central nervous system50,51. A few studies have tied MCP1, MCP3 and eotaxin to HIV-1 pathogenesis through interaction with HIV-1 directly or indirectly by stimulating cell division and migration of macrophages and dendritic cells52. One CCL2-CCL7-CCL11 haplotype, H7, which includes SNP variants in a 31-kb region, is associated with resistance to HIV-1 infection, based on comparisons of highly exposed uninfected individuals with exposed infected individuals49.

Two ARGs, interleukin-10 (IL10) and interferon-γ (IFNG), encode powerful cytokines that inhibit HIV-1 replication. The IL10 5′A SNP variant involves a promoter region alteration that reduces IL10 transcription by a factor of 2–4 (ref. 53). IL10 5′A allele-specific synthetic oligonucleotides do not bind certain ETS-family transcription factors, which recognize the wild-type IL10 allele sequence. Heterozygosity and homozygosity with respect to IL10 5′A accelerates AIDS progression, probably owing to downregulation of the inhibitory IL10 cytokine53. IFNG includes a polymorphic promoter region, one allele of which (−179T) is inducible by tumor necrosis factor (TNFα), whereas the wild-type allele (−179G) is not54. In African Americans, among whom the IFNG −179T allele has a 4% frequency (<1% in European Americans), individuals infected with HIV who are heterozygous with respect to IFNG −179T progress to AIDS more rapidly than IFNG −179G/G homozygotes, indicating that allele-specific inducibility confers a risk of rapid AIDS progression55.

The essential role of HLA, the human major histocompatibility complex, in detecting and presenting infectious agent peptides to T-cells is well established. The HLA region includes 128 expressed genes, one-third of which participate in various armaments of human immunity56. HLA class I (A,B and C) and II genes (DR, DQ and DP) have considerable allele variation between individuals and populations, which is probably a reflection of the natural selective influence by historic disease outbreaks that afflicted the ancestors of modern human ethnic groups. The abundant variation in HLA alleles provides a broad range for individual recognition of virus agents to which they had been exposed in the past, as well as those to which they had not57,58.

HIV-1 infects various immune cells as the prelude to HIV-1 replication, proliferation, spread and CD4-T lymphocyte damage. Because different HLA alleles specify cell-surface molecules with specific recognition sites for infectious agents59, differential HIV-1 peptide motif recognition can influence both the time interval from infection to AIDS60 and the kinetics of HIV-1 adaptive escape from immune surveillance in an infected individual61. For example, carriers of HLA-B*35 experience rapid progression to AIDS and HLA-B*35 homozygotes develop AIDS in half the median time that it takes for infected people without HLA-B*35 to develop AIDS62. Molecules encoded by HLA-B*35 comprise a serologically defined, closely related group with two distinct peptide recognition specificities. HLA-B*35-PY molecules recognize processed HIV peptides of nine amino acids with proline in position 2 and tyrosine in position 9; HLA-B*35-Px molecules present peptides with proline in position 2 and various amino acids not including tyrosine in position 9 (ref. 63). Survival association analysis indicated that HLA-B*35-mediated AIDS acceleration was entirely caused by HLA-B*35-Px, as carriers of HLA-B*35-PY developed AIDS at the same average rate as individuals with no HLA-B*35 alleles60. The HLA-B*35-Px versus HLA-B*35-PY distinction directly connected AIDS progression to HLA-B*35-Px peptide recognition.

The mechanism by which HLA-B*35-Px accelerates progression to AIDS does not seem to stem from a failure of the allele to stimulate cytotoxic T-cell recognition. Several amino acid residues in HIV gag proteins are HLA-B*35-Px targets61, and there is an equivalent induction of cytotoxic T lymphocytes (CTLs) in carriers of HLA-B*35-Px as in carriers of HLA-B*35-PY, indicating that the requisite HIV epitopes corresponding to HLA-B*35-Px and HLA-B*35-PY stimulate CTL induction64. A plausible hypothesis suggests that the HIV epitopes recognized by HLA-B*35-Px act as an 'immunological decoy' that effectively induces CTLs that are ineffective in destroying cells infected with HIV-1, thereby monopolizing CTL defenses. In support of this possibility, carriers of HLA-B*35-PY reduce HIV-1 viral load far more effectively than carriers of HLA-B*35-Px, despite having an equivalent quantitative CTL response64.

HLA influence on HIV-1 sequence divergence was confirmed by examining the pattern of mutation frequency among HIV-1 genomes in 473 Australians infected with HIV-1 (ref. 61). There was a dearth of sequence variants in regions of the HIV-1 genome with HLA recognition motifs that correspond to an infected person's HLA type but a mutational excess in regions with motifs recognized by HLA alleles that were associated with rapid progression, correlating HLA recognition of cognate HIV-1 motifs with AIDS survival. Two HLA alleles, HLA-B27 and HLA-B57 are consistently associated with a delayed onset of AIDS9,65. These HLA genotypes seem to be more effective in constraining HIV-1 mutational escape from immune clearance than other HLA alleles9,16.

The breadth of HLA allelic and immunological diversity is good for individuals with AIDS and for populations at risk for AIDS. People who are homozygous with respect to one or more of their HLA class I genes (A, B or C) progress to AIDS much faster than individuals who are heterozygous with respect to HLA-A, HLA-B and HLA-C62,66. The advantage of maximal HLA heterozygosity in individuals (or populations) infected with HIV-1 stems from maximizing the breadth of virus epitope recognition, driven to the extreme by a swarm of HIV-1 mutational variants produced each day.

Some provocative HLA associations involving functionally relevant HLA allele groupings with HIV infection and progression have also been reported67,68. Certain HLA supertypes, grouped according to B-pocket contact residues of the antigenic peptide recognized by HLA class I alleles, show modest associations with HIV transmission and with circulating set-point HIV concentration (viral load)68. Alleles of the HLA-Bw4 grouping, which comprises 40% of HLA-B alleles that share a seven–amino acid motif, are associated with long-term nonprogressors, although the study reporting this was controversial65,67. HLA-Bw4 alleles serve as a ligand for certain natural killer cell receptors encoded by the killer immunoglobulin receptor (KIR) gene complex on chromosome 19 (ref. 69). The combination of HLA-Bw4 and KIR3DS1 has an epistatic protective influence on AIDS progression70. An interaction between the HLA-Bw4 ligand molecules and their corresponding 'activating' receptor 3DS1 on natural killer cells seems to facilitate clearance of HIV-1-infected lymphocytes, slowing AIDS progression. In all, the immune response genes are fertile ground for discovery of important and powerful ARGs.

Implications of ARGs for AIDS therapy

The sixteen available anti-HIV-1 drugs target two enzymes encoded in the HIV-1 genome: reverse transcriptase and protease15. Recent drug formulations that allow single daily doses have improved compliance and prognosis, but there is still a pressing need for better, safer, cheaper and more effective anti-AIDS drugs. Defining the early steps in HIV-1 infection and AIDS progression (Fig. 1) has invigorated the search for new approaches to AIDS treatment71,72,73,74. Specific inhibitors of HIV-1 entry into cells and integration in cellular chromosomes can block cellular factors that HIV-1 exploits in AIDS pathogenesis. After CD4, CCR5 and CXCR4 receptors were shown to catalyze HIV-1 infection (Fig. 1), the finding that mutational variants of receptors and their ligands stemmed HIV-1 infection and progression (Table 2) raised hopes that synthetic compounds that blocked the process would be suitable inhibitors of virus-cell interaction. Inhibitors of HIV-1 entry have been developed, which block CD4 binding (PRO-542 and BMS0806), CCR5 binding (SCH-C, SCH-D, PRO-140, UK-426, UK-857), CXCR4 binding (AMD-3100) and gp41-mediated membrane fusion (T-20, T-1249). All these drugs are in advanced stages of clinical trials15, and one, T-20, was approved for public use in March 2003. In addition, an inhibitor of HIV-1 integrase (S-1360) has reached clinical setting based on efficacious results in vitro and in an animal model75. Although no human ARG that affects integration has been described to date, there is a compelling genetic evidence that the mouse locus Fv1 resists retroviral integration76.

New cell-based therapies have the advantage of being specific, being nontoxic, in several cases, and targeting host cell factors whose genes do not mutate and evolve resistance rapidly in vivo like HIV-1. Although HIV-1 can evolve resistance to cell-based therapies, the genetic restrictions of variants in CCR5, CXCL12, CCL5 and other ARGs (Table 2) suggest that inhibitors of HIV-1 entry may be effective in treating infected individuals. Finally, ARG variants like CCR5 Δ32 indicate that certain host cellular products are dispensable for normal development and immunity8, which bodes well for cell-based therapies that target their HIV-1 involvement.

A genetic propensity index for composite ARG genotype

The interpretation of clinical trials for new vaccines and anti-HIV drugs would benefit from a method to quantify the influence a particular ARG genotype might exert on the observed heterogeneity in participants in the trial. When considering the time interval from HIV-1 infection to the onset of AIDS, it is possible to measure the influence of single or composite ARGs on AIDS survival directly in our combined AIDS cohort populations14,16. To do this, we computed the RH of alternative genotypes relative to the common 'wild-type' genotype (Table 3) based on the Cox proportionate hazards model of Kaplan-Meyer survival curves12,13. For multiple loci RH is multiplicative, and the combined RH estimates the GPI for an individual (Table 3). To illustrate the principle, we present individual RHs and composite GPIs for eleven independent (in both epidemiologic and functional context) ARG genotypes (Table 3) and plot the predicted AIDS survival distribution (Fig. 2). An arbitrary baseline of GPI = 1.0 is the composite genotype carrying the wild-type genotype at each ARG locus.

Table 3 GPI of ARG genotypes
Figure 2: Predicted survival pattern based on GPI of four composite ARG genotypes from Table 3.
figure 2

The Weibull distribution is used to model the survival distribution, with baseline shape and scale parameters determined from AIDS cohort data, taking RH for each group from the GPI for the given genotype combination.

The GPI for the ARG genotype that contains all the alternative 'non-wild-type' genotypes is 1.46, not much different from baseline GPI, because the composite genotype containing all the variants includes five 'susceptible' and six 'resistant' genotypes, which offset each other in a survival distribution (Fig. 2). Composite GPIs for composites containing all eleven susceptible genotypes (RH ≤1.0) or all eleven resistant genotypes (RH ≥1.0) define the extremes (GPI = 0.077 and 18.84, respectively) of predicted survival times (Table 3 and Fig. 2).

In a clinical trial, the GPI predicts the influence of the participants' genotypes on the trial outcome, in principle considering genotypes a confounding variable in the trial. The usefulness of such an application depends on the observed and theoretical variance of actual RH compared with GPI prediction for the study population. This variance affects sampling variance and the fraction of the overall epidemiological variance contributed by the genes we assess. Estimating this latter parameter is the next topic.

Epidemiological influence of ARGs

The AR parameter can be used to quantify the influence of any epidemiological factor on disease incidence14,77,78. AR considers genetic mode (dominant, recessive, etc.) disease incidence and genotype frequency in disease cases (Box 1). The AR for each protective ARG genotype on the rate of AIDS progression (using Equation 1) is given in Table 4. The AR for genotypes that slow AIDS progression ranged from 2.4% (for CCR2 I64) to 6.9% (for CCR5 Δ32) for individual genotypes. When all the genes that slow AIDS progression are considered equivalent, the cumulative AR for long-term survivors is 21.1%. This means that 21.1% of individuals infected with HIV-1 who avoided AIDS for 11 or more years did so because they carried one or more protective ARGs. Of the individuals who develop AIDS rapidly, within 5.5 years of HIV-1 infection, the single ARG effects were modest (2.4–8.1%; Table 4), but the cumulative AR for all the ARGs associated with rapid progression is considerable (40.9%).

Table 4 RR, AR and EF of ARGs on AIDS progression

Although the overall AR values seem large, there are several limitations of the AR statistic78. First, AR was designed for dichotomous factors, not multiple factors. This is a difficult complication because offsetting susceptible and restricting ARGs in most people effectively cancel each other's influence. Second, AR values are not inherently additive; adding AR of multiple risk factors including genotypes will often exceed 100%, distorting the actual influence. Third, AR is asymmetrical with respect to population distribution of a disease in considering only the disease cases. AR measures whether a causal factor is necessary, but does not consider whether it is sufficient, because it does not consider the incidence of the factor in nondisease categories.

For these reasons, we used an epidemiologically symmetrical parameter, the EF, to assess the overall quantitative influence of multiple ARGs or other causal factors on AIDS survival distributions (Box 2). EF is a mutual information-based estimate of the influence of one or more causal factors (genetic or environmental) on disease incidence in the at-risk population; EF for each factor is additive to a maximum of 100% (ref. 25 and G.W.N. & S.J.O., unpublished data). In Table 4, we present EFs for each individual ARG and compare the EF sum for all protective and susceptible genotype combinations with the ARs for the same genotypes. For context we also compute EFs in our cohorts (Table 1) for the well-known nongenetic influence of age79,80.

The EF for each gene is small, ranging from ≤0.1% (for CCR2 I64) to 2.4% (for HLA-B*27). The EF for combined genotypes is computed for all combinations of 11 genotypes (73 different combinations). The EF for combined factors was 7.4% for the ARG influence on rapid progressors to AIDS and 9.5% for the explanation of long-term survivors who avoid AIDS for 11 years or more. The sum of the individual ARG EF estimates is 7.2% for rapid progressors and 9.8% for long-term survivors, comparable to the combined ARG genotype EF estimates (G.N. & S.J.O., unpublished data). The EF for all ARGs plus age is 9.9% for rapid progression and 12.7% for slow progression (Table 4).

The modest EF has important implications for the application of ARG genotype to prognosis of individuals with AIDS and to clinical trials of new therapeutics and AIDS vaccines. Because more than 90% of the epidemiological variation is not explained by the sum of ARG genotype, any weighting of study participants would have a large unexplained variance; the unexplained variance is ten times greater than the variance explained by the genes. This means that predictions based on ARGs in cohort studies or vaccine trials today would be imprecise and statistically noisy. This point is illustrated in Figure 3, which shows the time to AIDS predicted by GPI of 600 individuals plotted against the actual time to AIDS for the same individuals or for different cohorts. Both regressions show a modest correlation (R2 = 0.14 for each; Fig. 3), which, though statistically significant, would hardly be useful in a clinical trial setting. As such, there is much left to explain in the varying influences on AIDS progression.

Figure 3: Ability of imputed GPI-based median survival to predict actual AIDS survival.
figure 3

Actual AIDS-free survival91 versus the mean of the Weibull distribution for the subjects GPI and scale parameter, calculated by accelerated failure time regression (SAS PROC LIFEREG). Subjects are European Americans in combined AIDS cohorts (Table 1). (a) Actual survival of all subjects versus predicted survival from GPI calculated from accelerated failure time regression on all subjects. (b) Survival of subjects in the Multicenter Hemophilia Cohort Study and San Francisco City Clinic Men's Study cohorts versus predicted survival from GPI calculated for subjects in Multicenter AIDS Cohort Study and ALIVE Study cohorts (Table 1).

These computations do, however, offer some optimistic conclusions about AIDS and gene discovery. First, genes with modest RRs and EFs that have measurable quantitative effects on gene expression can be validated with high statistical confidence and replicated in independent cohort designs. Second, although the identified ARGs explain only a small fraction of the variance, their EF (7.3–9.4%; Table 4) is comparable to that of smoking in lung cancer (9.7–11.5%; Box 2). Third, we applied a quantitative yardstick to measuring the additive influence of multiple genes for a complex, multifactorial disease, a useful measure that tells us how much more epidemiologic variation needs to be explained. Finally, the identified ARGs were based on SNP variation association in candidate genes, already connected to AIDS by the basic AIDS research community. As less than 20% of the 25,000 human genes have names or suspected function, many unknown ARGs may exist.

Conclusions

The AIDS epidemic is a defining medical and public health issue of our generation. It has galvanized basic research in virology, vaccine development, antiviral therapy and health education. In this review, we attempted to highlight important points in identifying human genes that influence complex and multifactorial diseases, with the goal of applying these discoveries to further defining the regulatory components of an infected individual's cell physiology in AIDS progression and pathogenesis. The mechanism of AIDS restriction by natural genetic variants has encouraged the development of a new class of anti-AIDS drugs, which block HIV-1 binding to requisite receptors and membrane fusion (Fig. 1). Finally, we explore the potential application of predicting AIDS progression for a given genotype combination to inform clinical trials for new drugs and vaccines. As with genes associated with susceptibility to any complex, chronic disease, the hope is to inform diagnostics, prevention, vaccine and treatment development. For each application, the ARGs fulfill some but not all expectations, emphasizing the need for further gene identification and interpretation.

We examine two new epidemiology parameters. GPI is a quantitative score based on empirically observed epidemiological survival that predicts the RH and survival of different ARG combination genotypes (Table 3 and Fig. 2). This weighting is an early step toward predicting AIDS outcome among infected individuals by considering the survival influence of the genotypes they carry. EF is an information theory–based parameter designed to estimate the portion of epidemiological variance in an epidemic that is due to a particular ARG genotype or composite multilocus genotype. A modest percentage (9%) of the variation in AIDS progression kinetics are explained by the sum of ARGs now known, butt that fraction is comparable to the same estimate of smoking influence on lung cancer (9.7–11.5%), a well-known epidemiological factor.

The nexus of epidemiological patient cohorts and population genetic–based association approaches has shown that gene-based epidemiological analysis is a powerful tool for implicating multiple associated loci in complex disease. The identification and validation of the ARGs in Table 2 offer hope that additional genes not yet known will be discovered not only among known candidate genes, but also by SNP haplotype-based association in high-density genome scans of clinically well-described epidemiological cohorts81. We estimate that more than 90% of the genetic and nongenetic influence on AIDS progression in people infected with HIV-1 is undiscovered. Nonetheless, the validation and interpretation of the 14 known ARGs lends credence to the future search by genome-wide association scans of available cohort populations. The consequences of further investigation will be valuable if their translational benefits include improved diagnostics and treatment for AIDS.