Introduction

HIV-1 is the human retrovirus responsible for the HIV/AIDS pandemic, which has claimed more than 30 million lives over the past four decades. HIV infection continues to be a major global public health issue, with currently around 40 million people living with HIV (PLWH). Lifelong antiretroviral therapy (ART) has transformed the disease into a manageable chronic health condition. When available, ART enables PLWH to lead long and healthy lives but there is still no effective vaccine and no cure.

Early in the pandemic, it became clear that the risk of HIV acquisition is highly variable across humans. Socioeconomic and behavioural factors played a central role in this variability with some risk groups, such as intravenous drug users and men who have sex with men (MSM), being disproportionately affected1. Still, even among the most highly exposed individuals, a fraction remained HIV negative2,3. Similarly, important differences in the natural course of HIV infection (such as time from infection to AIDS diagnosis and the occurrence of opportunistic infections or malignancies) were only partially explained by known variables such as age and comorbidities4. Taken together, these clinical and epidemiological observations suggested a role for additional factors in the modulation of the individual response to HIV, including inherited variation in the genes and pathways involved in the retroviral life cycle and in innate or adaptative immunity against the infection.

HIV enters its main target cell, the CD4+ T lymphocyte, by binding to its receptor CD4 and to the co-receptor CC-chemokine receptor 5 (CCR5)5. This binding event triggers the fusion of the viral and human cell membranes, initiating a complex intracellular life cycle that will lead to the production of new viruses (Fig. 1). The natural immune response against HIV infection relies mostly on CD8+ T cells, also called cytotoxic T lymphocytes (CTLs). Upon primary infection, intense HIV replication results in a very high plasma viral load, measured as copies of the HIV RNA genome per millilitre of plasma, which is then partly controlled by the specific CD8+ T cell response. The very diverse human leucocyte antigen (HLA) class I molecules play a central role in this immune response by presenting small viral fragments, called epitopes, at the surface of infected cells. The recognition of these epitopes by CTL leads to the elimination of HIV-infected cells. A more efficient immune response is linked to a lower viral load during the chronic phase of an untreated infection and to slower disease progression, though it is unable to eliminate the virus6.

Fig. 1: Schematic representation of the HIV life cycle and the HLA-mediated host response.
figure 1

The viral envelope glycoprotein gp120 binds CD4 and CC-chemokine receptor 5 (CCR5) on the surface of target cells triggering the fusion of the viral and host cell membranes; host genomic studies have implicated genetic variants in CCR5 (purple background) listed as modifiers of infectivity. Reverse transcription of the single-stranded RNA genome into double-stranded DNA (dsDNA) occurs using proteins carried by the infecting virion. Viral dsDNA is trafficked to the nucleus, where it is integrated into the human genome. The transcription of viral dsDNA results in viral gene expression and genome replication. Viral mRNA is translated into polyproteins, which are cleaved by the viral protease. Functional proteins assemble with copies of the viral genome at the cell membrane and mature virions bud from the surface. In parallel, as part of the immune response, viral proteins are digested by the host proteasome and processed through tapasin I and II (orange rectangles) into the Golgi, where the epitopes are loaded in the human leucocyte antigen (HLA) class I molecules. The peptide-loaded HLA protein is trafficked to the cell surface and presented to cytotoxic T lymphocytes (CTLs). Variability in epitope presentation by HLA-B alleles, such as the protective allele B*57:01, and in expression levels of HLA-C and HLA-A modify the response to infection and the set point viral load (red background). TCR, T cell receptor.

As a retrovirus, HIV can be described as a genomic pathogen. Indeed, it not only uses the molecular machinery of the infected cell for replication and dissemination but it also has the remarkable capacity to integrate a DNA copy of its RNA genome into a host cell chromosome. By becoming part of the human genome, HIV can persist in long-term cellular reservoirs for decades, making it extremely challenging to develop therapeutic strategies resulting in complete eradication7.

To better fight HIV infection, we must once again consider the old Delphic maxim: ‘know thyself’. Because HIV is an expert at hijacking human cells and immunity, we have no choice but to improve our understanding of our inner machinery, starting with the most fundamental layer of biological information — the human genome. The exploration of human diversity at the DNA level, long hampered by technological limitations, has been fuelled by the development of new and more powerful tools over the past decades8. Thanks to progress in our understanding of human genetic diversity, in genotyping and sequencing technology, as well as in bioinformatics and data science, it became possible to search for genetic factors that modulate the individual response to HIV, including the resistance and susceptibility to infection and the natural history of the disease in PLWH9.

In this Review, we first present an overview of the technological and conceptual developments that have fuelled HIV host genomic research. We then describe the major genetic factors modulating the natural history of HIV infection — in the HLA class I region and the CCR5 locus. Next, we highlight the recent convergence of human and HIV genomics, which allows longitudinal analyses of host–pathogen genetic interactions. Finally, we explain how genomic knowledge is poised to have a positive impact on PLWH, notably through pharmacogenomic interventions and stratification of care based on polygenic risk scores (PRS) before discussing the short-term and long-term perspectives for translational research and clinical applications of human genomics in the field of HIV.

Discovering host genetic variants

The search for human genetic differences that have an impact on HIV-related outcomes was first motivated by clinical observations, namely the striking variability in individual trajectories of patients in the absence of treatment. It was further propelled by a desire to uncover fundamental physiopathological mechanisms by the careful exploration of genomic variants and their impact on host and viral molecular processes.

Candidate gene studies

In the candidate gene approach, population-level associations are sought between HIV-related phenotypes and specific genetic variants in genes that have been selected based on previous biological knowledge or functional work. The selected variants are typically typed using targeted genotyping assays or Sanger sequencing of the region of interest. This framework was first applied to HIV host genetics in the early 1990s in analyses of allelic variants in genes known or suspected to play a role in HIV pathogenesis or in the antiretroviral immune response. Therefore, genetic associations were reported in two broad categories: genes coding for proteins involved in the HIV life cycle (such as PPIA10 and TSG101 (ref.11)) and immune-related genes encoding molecules implicated in innate and adaptive immune pathways (such as MBL2 (ref.12) and TLR9 (ref.13)) as well as in specific antiretroviral defence mechanisms (such as APOBEC3G14 and TRIM5 (ref.15)). Dozens of genes were tested in multiple cohorts. Unfortunately, as has been the case in the broader field of human genetics, most reported associations turned out to be false positives, notably owing to the small sizes of the studied cohorts, population stratification and the lack of correction for multiple testing. Replication attempts in larger cohorts, where these factors could be better controlled, showed no association for the vast majority of variants16,17,18. In fact, only two major discoveries remain from the candidate gene era: the protective effect of a homozygous 32 bp deletion in CCR5 (CCR5Δ32) against HIV acquisition19,20,21 and the modulating effect of HLA alleles on HIV progression for which early studies, largely in MSM of European ancestry in the United States, noted a strong impact of the HLA-B alleles B*57 and B*27 on delaying time to AIDS onset22.

Genome-wide association studies

Advances in genotyping and sequencing technologies progressively transformed human genetic analyses during the first decade of this century. In particular, the commercial availability of genome-wide genotyping arrays marked the beginning of the era of genome-wide association studies (GWAS). The principle of a GWAS is to simultaneously test very large numbers of genetic variants throughout the genome for potential associations with a phenotype of interest23. This truly agnostic approach finally allowed for a more comprehensive exploration of the human genome. To date, most GWAS have been based on the genotyping of single nucleotide polymorphisms (SNPs) followed by imputation, a process that leverages the linkage disequilibrium property of the human genome to statistically infer the genotypes that are not directly measured. This approach allows near-comprehensive testing of common variants (that is, variants with a minor allele frequency of >1%) in most human populations24.

The first GWAS of any infectious disease focused on the level of detectable viral genetic material in the blood of untreated, chronically infected individuals during the period of HIV latency25. This phenotype, known as set point viral load (spVL), was selected because of its relative ease of measurement and its known correlation to the rate of progression to AIDS26 and transmission potential27. The spectrum of interrogated variants was limited by early DNA genotyping arrays, yet genome-wide significant associations were identified in the HLA class I region, the most polymorphic locus in the human genome, known to have a crucial role in the modulation of T cell immunity (see ‘HLA variation in HIV control’, below). These findings were soon validated and expanded by other GWAS performed in independent cohorts, which demonstrated that the genetic architecture of HIV spVL is comparable between the general population of PLWH16,28,29,30 and a particular group of individuals able to maintain low viral loads for prolonged periods of time in the absence of ART, the so-called HIV controllers17,31. The absence of specific genetic factors explaining the HIV controller phenotype was a disappointment in terms of potential therapeutic development. However, it is consistent with what has been found for many complex human traits and diseases — that individuals at the extremes of the phenotypic distribution are more likely to carry multiple common variants with weak effects rather than rare, high-impact variants32. Beyond genotyping, a single exome sequencing study has been published so far in the HIV field33 also indicating that rare coding variants with large effect sizes are unlikely to make a major contribution to host control of HIV infection.

GWAS were less successful in the search for determinants of HIV resistance, with no definitive evidence found of human genetic polymorphisms conferring an altered susceptibility to HIV apart from CCR5 variation18,34. However, recent genome sequencing studies of extreme exposure phenotypes35 have shown promising associations in CD101, a gene encoding an immunoglobulin superfamily member implicated in regulatory T cell function36, and in UBE2V1, which encodes a ubiquitin-conjugating enzyme involved in pro-inflammatory cytokine expression37,38 that associates with the HIV restriction factor TRIM5α38. Although both CD101 and UBE2V1 are plausible candidates, further functional studies are required to validate their role in HIV susceptibility. Finally, analyses of GWAS data provide evidence for residual heritability owing to additive genetic effects beyond CCR5 (ref.18) and genetic overlap with behavioural and socioeconomic traits39. These results suggest that larger genomic studies of HIV acquisition may identify additional loci that impact susceptibility and warrant further investigation, potentially in large biobanks.

Several intrinsic limitations make it difficult to investigate the genetic mechanisms potentially involved in HIV resistance. For example, sample sizes are usually small (in the tens or hundreds) because studies need to be performed on highly exposed yet uninfected individuals such as patients with haemophilia exposed to HIV through contaminated blood products34, sex workers in hyper-endemic areas40 or serodiscordant couples (stable heterosexual couples where one partner has HIV infection and the other is seronegative for HIV at enrolment)29. Frailty (or survival) bias is a limitation in cross-sectional studies of HIV cohorts with long-term follow-up, as these cohorts are enriched for genetic factors protecting against HIV disease progression. Another limitation is misclassification bias in studies comparing the genomes of patients with HIV infection to unselected controls from the general population, in which most individuals are in fact susceptible to HIV infection18. The identification of additional genetic determinants of individual susceptibility to HIV infection will require increased sample sizes (ideally in the thousands) as well as the use of sequencing approaches to characterize the rare functional variants that are not interrogated in studies based on genotyping arrays.

HLA variation in HIV control

The HLA locus in infectious diseases

The human major histocompatibility complex (MHC) located on chromosome 6 is one of the most genetically diverse loci in the genome41. The extended MHC occupies ~7.6 Mb of the human genome42 and encodes more than 400 genes, many of which are key mediators of the innate and adaptive immune responses. Within this locus, alleles at the classical class I (HLA-A, HLA-B, HLA-C) and class II (HLA-DR, HLA-DQ, HLA-DP) genes have been associated with numerous autoimmune, inflammatory and infectious diseases (reviewed in refs43,44) with recent preprints demonstrating extensive disease associations in large biobanks from multiple populations45,46. In the context of infectious disease, class I HLA proteins present endogenous peptides on the surface of infected cells for recognition by CTLs, triggering the development of an adaptive response. As discussed below, the variability in epitope specificity of HLA proteins and expression levels of HLA class I alleles has a dramatic impact on the progression of HIV disease.

Effects of amino acid variability encoded by HLA alleles

An individual’s genotype at class I HLA genes has been consistently demonstrated to be the major host genetic determinant of HIV spVL and rate of disease progression across geographic contexts and ancestries17,22,47,48,49,50. This observation was put in the genome-wide context by the first GWAS of HIV spVL25 and HIV controllers17 that exclusively identified SNPs in strong linkage disequilibrium with classical HLA-B alleles. Although array-based techniques for the genotyping of DNA samples do not allow for the direct resolution of classical HLA alleles, computational methods leveraging linkage disequilibrium structure between SNPs and sequence-based HLA types in reference populations allow for accurate imputation of classical HLA types from GWAS data51. The application of this technique to a sample of >6,000 PLWH of European ancestry underscored the dramatic effect of HLA-B*57:01 on reducing viral load, which was, on average, ~0.8log10 RNA copies/ml lower in individuals carrying this allele52. This study also demonstrated strong associations at multiple other classical class I HLA alleles that had a range of effects, from decreasing spVL (B*57:01, B*27:05, B*13:02, B*14:02, C*06:02, C*08:02, C*12:02) to increasing spVL (B*07:02, B*08:01, C*07:01, C*07:02, C*04:01).

To better understand how functional variation in HLA class I proteins can impact HIV spVL, recent studies have tested variable amino acid positions within these proteins to fine-map the classical allele associations. In a GWAS performed by the International HIV Controllers study, this technique was applied to demonstrate that previously identified associations between HIV control and classical HLA alleles such as B*57:01 could be explained by variability across a small number of amino acid positions within the HLA-B protein17. The strongest effect was observed at position 97 of the protein, which accommodates six alternative amino acids, including valine, which is unique to B*57 haplotypes. A recent preprint describing the comprehensive analysis of the impact of HLA amino acid polymorphisms on spVL in a multi-ethnic sample of >12,000 PLWH identified three amino acid positions in HLA-B (positions 67, 97 and 156) and one in HLA-A (position 77) as independently associating with spVL53. The positions within HLA-B map to classical HLA alleles known to impact spVL, whereas the HLA-A position suggests that HLA-A functions independently of HLA-B. Interestingly, all four positions are located in the peptide-binding groove of the respective HLA protein, supporting the hypothesis that epitope presentation is key for the natural suppression of HIV replication. Furthermore, there was no substantial evidence that the effects of these polymorphic positions differed across ancestry groups, suggesting biological relevance across global contexts.

Several mechanisms of action have been proposed to explain why different alleles of the same HLA gene have differential effects on HIV progression. Studies of epitope specificity have shown that certain protective alleles, including B*57:01 (which uniquely carries valine at position 97) and B*27:05 (which carries the protective cysteine and asparagine residues at positions 67 and 97, respectively), drive compensatory mutations in the HIV genome leading to reduced viral fitness54,55,56. In addition to differential epitope specificity, the CTL effector function induced by epitope presentation has been implicated in HIV control, with CTLs in carriers of some protective HLA alleles exhibiting an enhanced proliferative capacity and more polyfunctional responses57,58,59.

Within-host diversity and epitope presentation

In addition to the impact of specific class I HLA alleles on HIV progression at the population level, the within-host diversity of HLA alleles may be important at the individual level. An early study looking at the impact of allele combinations revealed that maximum heterozygosity at HLA class I genes (that is, individuals carrying two different alleles at all three class I genes) was associated with a reduced time to AIDS47. This observation was supported by a GWAS that showed that individuals carrying different HLA alleles at each class I gene had a significantly lower viral load than homozygous individuals, even after accounting for the additive effect at each allele60. This heterozygote advantage likely comes from the ability to present multiple HIV epitopes, supporting the hypothesis that the breadth of presentation is beneficial in preventing HIV progression. To further test this hypothesis, a recent in silico study used novel algorithms to predict the binding affinity of all possible 9-mer peptides in the HIV proteome to HLA proteins encoded by the different class I alleles61. Coupling these predicted affinities to clinical and genetic data demonstrated that spVL was negatively correlated with the breadth of the peptide repertoire bound by an individual’s HLA protein isoforms. Moreover, HLA-B isoforms had the largest predicted breadth of epitope recognition and conferred the strongest reduction of viral load (Fig. 2a). However, the quantity of epitopes alone is unlikely to fully explain the protective capacity of an individual’s HLA alleles, as subsets of epitopes that are uniquely presented by protective HLA isoforms explained more of the observed variance in spVL than the entire predicted set. This observation is further supported by an in silico and functional study that demonstrated that HIV epitopes that encode structurally important residues are preferentially targeted by protective HLA isoforms and associate with elite control of replication62. Thus, the quantity and quality of HIV epitopes presented by combinations of HLA isoforms are the key drivers of spVL.

Fig. 2: Classical and non-classical effects of HLA class I on HIV suppression.
figure 2

a | HIV-infected cells expressing protective HLA-B alleles tend to present a more diverse and more structurally conserved set of HIV epitopes compared to non-protective alleles. Interactions with protective alleles tend to produce a more polyfunctional cytotoxic T lymphocyte (CTL) response. b | HLA-C protein isoforms vary broadly in their level of expression on the surface of infected cells. HLA-C alleles that do not have a binding site for microRNA-148a (miRNA-148a) in the 3′ untranslated region of their mRNAs escape suppression and present more peptide on the cell surface than alleles with an miRNA-148a binding site, resulting in the initiation of stronger CTL responses. c | Different HLA-A alleles express different amounts of HLA-A signal peptide, which positively correlates with HLA-E peptide expression. HLA-E interacts with the NKG2A receptor on the surface of natural killer (NK) cells and, when highly expressed, inhibits the killing of infected cells.

Non-classical effects of HLA variation

In addition to the classical effects of HLA genes on peptide presentation, several studies have suggested that non-classical effects may play a part in limiting HIV replication in vivo. In particular, the variable expression levels of classical HLA-C alleles have been linked to HIV control, with those expressed at high levels conferring protection against disease progression63. This effect has been observed across ancestries and has been linked to the absence of a variable microRNA-148a (miR-148a) binding site in the 3′ untranslated region of HLA-C64. The proposed model suggests that mRNA from alleles lacking the miR-148a binding site escape suppression by miR-148a; as a consequence, proteins encoded by these alleles and loaded with HIV epitopes are expressed at higher levels on infected cells, allowing for greater rates of detection by CTLs64 (Fig. 2b). Similarly, proteins encoded by HLA-A alleles are also expressed at variable levels on the cell surface65. However, in contrast to HLA-C, HLA-A alleles expressed at high levels associate with poorer control of viral replication and with faster disease progression66. A combination of genetic and functional studies indicated that increased HLA-A expression levels correlated with higher viraemia in a combined cohort of more than 9,000 PLWH from sub-Saharan Africa and the United States. It was proposed that this effect may be the result of enhanced production of the HLA class I signal peptide that regulates HLA-E expression, a hypothesis that was supported by a correlation between HLA-A expression and HLA-E expression among 58 healthy donors tested66. HLA-E is a ligand for natural killer group protein 2A (NKG2A) and their interaction results in strong inhibition of natural killer (NK) cell degranulation (Fig. 2c). Thus, the enhanced production of the HLA class I signal peptide in individuals carrying highly expressing HLA-A alleles may lead to enhanced inhibition of immune responses in infected individuals, resulting in poorer clinical outcomes.

Finally, it has also been observed that the combination of HLA genotype and the expression of particular killer cell immunoglobulin-like receptor (KIR) proteins variably modulated HIV disease course67. The KIR proteins are a highly variable set of cell-surface receptors expressed on NK cells (and some T cells) that, when engaged by their cognate receptors, either activate or inhibit NK cell-mediated killing (recently reviewed in ref.68). In particular, the combination of the activating KIR3DS1 allele with a set of HLA-B alleles that carry isoleucine in the Bw4 epitope (Bw4-I80) is highly associated with HIV control69. Taken together, these results demonstrate the complex interplay between epitope presentation, HLA protein expression and NK inhibition.

CCR5 variation in HIV infection

CCR5Δ32 and resistance against HIV infection

Perhaps the most highly touted example of human genetic variability restricting infectious diseases is the observation that individuals carrying two copies of a loss-of-function variant in the gene encoding the cell receptor CCR5 are highly resistant to infection by HIV. CCR5 is a chemokine receptor expressed on the surface of multiple subsets of monocytes and lymphocytes, including CD4+ T cells, the major HIV target cells. At the earliest stages of infection, the HIV envelope protein gp120 binds CD4 and CCR5 on the cell surface, resulting in fusion of the viral and host cell membranes and in the release of the viral genome into the target cell. The discovery that individuals who carry homozygous loss-of-function alleles at CCR5 are resistant to infection was first made in a group of MSM that were multiply exposed to the virus but remained uninfected19. It was determined that these men all shared a 32-bp deletion in the CCR5 gene (the CCR5Δ32 allele) that leads to the production of a non-functional protein and the absence of functional CCR5 on the cell surface prevents HIV from entering target cells (Fig. 3a). The CCR5Δ32 allele is observed at ~10% frequency in individuals of European ancestry (homozygosity occurs at a frequency of 1%), at a reduced frequency in southern Europeans compared to those in the north70 and is not observed at an appreciable frequency in other continental populations. Compound heterozygotes (that is, individuals carrying one copy of CCR5Δ32 and a second loss-of-function CCR5 variant) are also resistant to infection, although these individuals are exceedingly rare71.

Fig. 3: CCR5 expression modifies HIV progression.
figure 3

a | A 32-bp deletion in CCR5 (CCR5Δ32) results in the reduced expression of CC-chemokine receptor 5 (CCR5) on the surface of target cells. Heterozygous individuals exhibit reduced CCR5 expression, lower set point viral loads (spVLs) and slower disease progression. Individuals carrying two defective copies of the CCR5 gene show no surface expression and are highly resistant to HIV infection. Additionally, rs1015164, a single nucleotide polymorphism downstream of CCR5, affects cell surface levels of CCR5. In homozygous reference (A/A) and heterozygous (A/G) individuals, the surface expression of CCR5 is normal, whereas G/G homozygous individuals have a lower CCR5 surface expression and lower spVLs. b | rs1015164 regulates the expression of an antisense RNA termed CCR5-AS. When CCR5-AS (orange) is expressed at high or intermediate levels in homozygous reference (A/A) and heterozygous (A/G) individuals, respectively, CCR5 mRNA (blue) is protected from RALY-mediated degradation and results in normal levels of surface expression. In G/G homozygous individuals, CCR5-AS expression is diminished and CCR5 mRNA is degraded, resulting in lower surface expression at the cellular level and a lower spVL overall. lncRNA, long non-coding RNA.

The observation that individuals lacking CCR5 expression are resistant to HIV infection directly led to the development of the antiviral drug Maraviroc, a CCR5 antagonist72, as well as to the world’s first ethically fraught attempt at human embryo engineering73. Perhaps most interestingly, bone marrow transplants between CCR5Δ32 homozygous donors and recipients with HIV infection have resulted in the only two confirmed cases of long-term HIV cure74,75. Although promising, this effect has been difficult to replicate in engineered autologous stem cell models76 and is unlikely to be scalable to the level necessary to stem the pandemic. Additionally, the protection is not absolute, as several confirmed cases of infection in CCR5Δ32 homozygotes have been reported (reviewed in ref.77), presumably by viruses that utilize the minor co-receptor CXCR4 or by dual-tropic viruses.

Associations between CCR5 variation and spontaneous HIV control

In addition to the impact of homozygosity on preventing infection, individuals with a single CCR5Δ32 copy exhibit lower spVL and delayed disease progression compared to those with two functional copies19,20,78, likely because the reduced levels of CCR5 protein on the cell surface lower the efficiency of HIV entry into target cells (Fig. 3a). The CCR5 locus was also identified in GWAS, first in a study of ~2,500 PLWH in Europe16 and then in an expanded set of 6,300 individuals from across the globe52. However, the CCR5Δ32 allele was not directly assayed on the genotyping platforms used in these studies, thus only proxy SNPs were identified. In a combined analysis of GWAS data and direct CCR5Δ32 genotyping, it was observed that the CCR5Δ32 allele was not the most strongly associated variant in the region, suggesting that multiple independent genetic effects occur at this locus. Conditional analysis accounting for the effect of CCR5Δ32 showed that an additional marker, rs1015164, was also strongly associated with spVL. Functional analysis of this variant showed that it regulates the expression of an antisense long non-coding RNA called CCR5-AS, which overlaps the CCR5 gene79. This study further showed that the increased expression of CCR5-AS resulted in increased CCR5 expression because CCR5-AS interfered with the RALY-mediated degradation of CCR5 mRNA. Moreover, the knockdown of CCR5-AS reduced the susceptibility of CD4+ T cells to HIV-1 infection ex vivo (Fig. 3b). These results demonstrate that the clinical course of untreated HIV infection is directly influenced by the innate level of CCR5 expression within the infected individual. Whether additional functional polymorphisms in CCR5 have similar effects remains an open question.

Host and pathogen genetic variation

Pathogen sequence variation as an indicator of host genetic pressure

Most studies performed so far in the field of host genetics focused on clinically defined outcomes such as the susceptibility to infection or disease progression. However, intermediate phenotypes have been shown to be very valuable in identifying subtle genetic association signals that are not always detectable using more complex clinical outcomes. A particularly promising intermediate phenotype, unique by its nature to infectious diseases, is variation in the pathogen genome (Fig. 4). HIV is a highly variable virus that establishes a lifelong infection. Therefore, it represents an ideal model to search for the potential effects of intra-host selective pressure on a human pathogen. While some of the variants observed in the HIV genomic sequence are present in the transmitted/founder virus, another fraction is acquired during the course of the disease resulting, at least partially, from the selective pressure exerted by the host response to infection. Signs of host-driven selection are clearly visible in the HIV genome. In particular, specific variants have been described in key viral epitopes presented by HLA class I molecules and targeted by CTL responses80. Mutations have also been reported in regions targeted by KIR, suggesting the escape from immune pressure by NK cells81,82. A non-negligible fraction of the HIV-1 genome (~12%) is under positive selection but only about half of the positively selected sites map to canonical CD8+ T cell epitopes83, indicating that additional host factors could be driving evolution in non-epitope sites.

Fig. 4: Detecting genomic signatures of host–pathogen interactions in matched host and virus samples.
figure 4

First, genetic variants in the host (human) and pathogen (viral) genomes are identified from genome-wide genotyping or sequencing data and catalogued. A genome-wide search for associations between human polymorphisms and viral variants is then performed, which needs to consider the risk of systematic signal inflation owing to population stratification. On the human side, this can be addressed by methods that infer genetic ancestry, such as principal component (PC) analysis followed by the inclusion of the top PCs as covariates, or by the use of mixed models that incorporate the full covariance structure of the study population130. On the pathogen side, various phylogenetic-based and model-based approaches have been proposed85. Significant associations, after correction for multiple testing, reveal the loci involved in host–pathogen genomic conflicts. The plot showing the signatures of viral–host interactions is adapted from ref.84, CC BY 3.0 (https://creativecommons.org/licenses/by/3.0/).

Genome-to-genome studies

Computational approaches developed over the past decade have allowed more comprehensive analyses of the reciprocal genetic signals resulting from the host–pathogen interaction84,85. Joint analyses of human and HIV sequence variation start with the generation of large-scale genomic data from paired samples. The retroviral genome can be isolated and sequenced either as native RNA during replicative infection or as proviral DNA, integrated into the host genome, during latent infection. Human genomic information can be obtained using genotyping or sequencing technology. The principle of genome-to-genome (G2G) studies is then to perform a systematic search for associations between human genetic polymorphisms and viral sequence variants, at the nucleotide or amino acid levels. Because of the very large number of models run in parallel — one GWAS for each viral variant — this approach requires stringent correction for multiple testing. By mapping all interacting loci, G2G studies have the potential to uncover the most important genes and pathways involved in specific responses to infectious agents, thereby revealing novel diagnostic or therapeutic targets. In addition to identifying the sites of genetic interplay between virus and host, this study design makes it possible to estimate the biological consequences of such interactions and to estimate the relative impact of human and viral genetic variation on phenotypic outcomes by assessing associations between human-driven escape at viral sites and a quantitative clinical phenotype. In spite of these promises, it must be acknowledged that G2G studies have not led, as of today, to the identification of novel HIV restriction factors in the human genome86. Future studies will require larger sample sizes to increase power but also more diversity with a strong focus on the inclusion of PLWH of non-European ancestries.

Nevertheless, studies based on the combined analysis of host and pathogen genomic variation have already demonstrated their potential in other infections. In particular, the use of a similar study design in chronic hepatitis C virus infection highlighted the evolutionary pressure exerted by both innate (interferon-λ) and acquired (HLA class II) immune defence mechanisms87,88. The intra-host evolution of DNA viruses can also be investigated using a G2G approach, as shown in a recent study that revealed several associations between human and Epstein–Barr virus sequence variation in immunosuppressed PLWH89.

HIV precision medicine

Expanding antiretroviral therapy to eradicate HIV

With the now accepted knowledge that PLWH who do not have detectable plasma viral loads cannot transmit the virus to others90, the Joint United Nations Programme on HIV/AIDS (UNAIDS) set an ambitious 90–90–90 target91, where 90% of infected people know their status, 90% of those are on antiviral therapy and 90% of those are suppressing the virus below the level of detection. This aspirational treatment target would practically mean, given currently available technologies, that more than 34 million people would be on lifelong chemotherapy. Although this treatment as prevention approach would undoubtedly result in decreases in transmission and dramatic increases in life expectancy for the population with HIV infection, it also requires a deeper understanding of how human genetic variation relates to variability in drug toxicity and response to long-term therapy.

HIV pharmacogenetics

In addition to affecting HIV disease progression in untreated individuals, human genetic variability has also been implicated in modifying the response to treatment. A major achievement in the fight against HIV has been the development of multiple, effective therapeutics that target several stages of the viral life cycle. These include entry inhibitors, which prevent the binding of the viral spike protein gp120 to host cell receptors and fusion of the virus with host cell membranes; nucleoside and non-nucleoside reverse transcriptase inhibitors, which prevent the reverse transcription of the viral RNA genome into DNA; integrase inhibitors, which prevent the integration of the viral DNA product into the host genome; and protease inhibitors, which prevent the cleavage of viral polyproteins into their functional subunits (Fig. 5). For several classes of anti-HIV therapy, human genetic variability is known to influence response to the drug, which in some cases leads to severe adverse events and treatment discontinuation92. Paradoxically, the HLA-B allele B*57:01, most notably associated with the control of infection, also predisposes carriers to a severe hypersensitivity reaction to the nucleoside reverse transcriptase inhibitor abacavir93,94; the high specificity binding of abacavir alters the binding pocket of HLA-B*57:01 and triggers reactivity to self-peptides95. Similarly, variants in genes encoding the drug-metabolizing enzymes CYP2B6 (refs96,97,98), CYP2A6 (ref.99), CYP2C9 (ref.100), CYP2C19 (ref.100), CYP3A101 and ABCC2 (ref.101) have all been associated with slow metabolization kinetics of their cognate drugs (Table 1), in some cases leading to drug accumulation in the brain, psychiatric complications and treatment stoppage99. The frequency of many of these polymorphisms varies depending on ancestral background, leading to reduced drug tolerance and therefore reduced efficacy in some populations. For example, the allele CYP2B6*6 (rs3745274), which results in the slow metabolism of efavirenz and nevirapine, two non-nucleoside reverse transcriptase inhibitors recommended for first-line use by WHO until recently, has an approximately twofold higher frequency in some African populations compared to Europeans102. This increased frequency and the resulting adverse events led to thousands of cases of treatment discontinuation in Zimbabwe when the nation adopted a single-pill efavirenz-containing regimen103. This example highlights the need to not only tailor the therapy to the individual but, in some cases, to the population as well. Newer generations of HIV therapies, such as integrase inhibitors and advanced nucleoside reverse transcriptase inhibitors, have more favourable pharmacokinetic and safety profiles104. However, the effects of long-term treatment with these drugs and any potential interactions with human genetic variability remain to be understood.

Fig. 5: Antiretroviral drugs target multiple stages of the HIV life cycle.
figure 5

Commonly used antiretroviral drugs target receptor binding, reverse transcription, integration and protease cleavage. Genetic variation in several human genes (in bold) have been shown to modify drug metabolism and contribute to adverse drug reactions (Table 1). CCR5, CC-chemokine receptor 5.

Table 1 Genetic variants known to affect the pharmacokinetics of anti-HIV drugs

Complex trait genomics in HIV medicine

In addition to the direct interactions between host genotype and drug metabolism, patients on long-term HIV therapy also experience early onset of several chronic diseases, including cardiovascular disease105,106,107, metabolic syndrome108, kidney disease109,110 and liver fibrosis111. These conditions are all known to have high heritability in the HIV uninfected population and genetic risk factors for type 2 diabetes mellitus112 and cardiovascular disease113 have been shown to be enhanced in PLWH on therapy. Recently, there has been a push to develop PRS in the general population. These scores, built by summing the additive effects of dozens to thousands of genetic variants within an individual, have been shown to have a strong predictive ability for multiple metabolic, inflammatory, tumoural and cardiovascular conditions32. Investigations of PRS in the specific context of HIV infected individuals receiving long-term antiretroviral therapy have just begun, with the recent demonstrations that the prediction of chronic kidney disease can be improved through the addition of a PRS to the known clinical and pharmacological risk factors114,115 and that a PRS can be useful to stratify PLWH at a high risk of cardiometabolic diseases who may benefit from preventive therapies116. An important caveat is that PRS are not necessarily transferable across ancestral groups and, as in all areas of genomics, attention should be paid to enhancing diversity and ensuring equity in precision medicine approaches.

Conclusion and future perspectives

Host genomic studies have advanced our understanding of HIV biology in several important ways. Firstly, the demonstration of the dominant impact of HLA variation on HIV progression in the context of the whole genome reinforced the need to focus on T cell responses in vaccine design. Moreover, the ability to accurately infer HLA allele types and protein-level variability from genotyping array data, an approach first piloted in HIV genomic studies, has greatly increased our understanding of how amino acid variability in HLA molecules contributes to multiple medically important traits. Secondly, dense genotyping and large sample sizes enabled the discovery of multiple, independent signals in the CCR5 locus, which provided a deeper understanding of how the expression of CCR5 is regulated and how it modulates HIV infection beyond the known impact of the CCR5Δ32 allele. Finally, amassing genome-wide data for large cohorts of PLWH has enabled the validity of previous candidate gene associations to be assessed, providing a new standard for identifying novel loci of HIV restriction.

In recent years, there have been several barriers to further advancing our understanding of how host genomics affects HIV susceptibility and progression. Firstly, current studies have predominantly included individuals of European ancestry, mirroring the lack of diversity in genomics in general117, which is particularly problematic because the vast majority of PLWH are non-White. The example of the population-specific CCR5Δ32 allele further highlights the need to stretch beyond European cohorts to determine if other population-specific effects may exist. Attaining the large sample sizes required for genomic discovery in non-European populations will require a substantial investment of resources and building of capacity in low-income and middle-income countries. Furthermore, understanding the potential function of genetic variants identified in diverse samples will require a shift towards inclusivity across genomics databases118. Secondly, with improvements in HIV care and broad adoption of test and treat strategies, the focus of host genomics studies has necessarily shifted away from the natural history of infection phenotypes to intermediate phenotypes, pharmacogenomics of long-term therapy, comorbidities or vaccine response. Thirdly, understanding other classes of genetic variation that are not well captured by genotyping arrays, for example, diversity of KIR alleles and T cell receptor usage, the other partner in the HLA interaction, should be investigated to better understand how genetic variation in key innate and adaptive immune genes impact disease outcomes. However, capturing these types of variation requires in-depth sequencing to resolve genetic diversity and, in the case of T cell receptor variation, targeted immune assays to capture the relevant cells. Progress on computational methods for inferring variation at some complex loci from genotyping array data51,119 or next-generation sequencing data120,121,122 will greatly aid these efforts.

The full translational potential of host genomics discovery in HIV has yet to be realised. Although the association between HLA allele type, epitope binding and HIV control have been well established, this knowledge has yet to be translated into an effective preventative or therapeutic vaccine. As mentioned above, treatment of PLWH with CCR5-deficient cells has shown potential as an HIV cure but several technological improvements in autologous cell editing will be required before it becomes a scalable strategy. In addition to targeting host genes for editing, in vitro studies have also shown that it is feasible to directly target and excise the integrated proviral genome123,124. Although an extremely promising strategy, the delivery of the necessary machinery to latently infected cells remains a challenge.

The host genomics approach established in HIV research has since been applied to several other infectious diseases, including those posing substantial threats to human health such as hepatitis C virus125,126, tuberculosis127, malaria128 and even SARS-CoV-2 (ref.129), among others. These studies have time and again uncovered novel therapeutic targets and mechanisms to identify the individuals who are most vulnerable to specific infections. As the world struggles with a novel pandemic-causing RNA virus, the lessons we can learn from how the human genome contributes to variability in outcome have never been more important.