HIV-1 and human genetic variation

Over the past four decades, research on the natural history of HIV infection has described how HIV wreaks havoc on human immunity and causes AIDS. HIV host genomic research, which aims to understand how human genetic variation affects our response to HIV infection, has progressed from early candidate gene studies to recent multi-omic efforts, benefiting from spectacular advances in sequencing technology and data science. In addition to invading cells and co-opting the host machinery for replication, HIV also stably integrates into our own genome. The study of the complex interactions between the human and retroviral genomes has improved our understanding of pathogenic mechanisms and suggested novel preventive and therapeutic approaches against HIV infection.

HIV-1 is the human retrovirus responsible for the HIV/AIDS pandemic, which has claimed more than 30 million lives over the past four decades. HIV infection continues to be a major global public health issue, with currently around 40 million people living with HIV (PLWH). Lifelong antiretroviral therapy (ART) has transformed the disease into a manageable chronic health condition. When available, ART enables PLWH to lead long and healthy lives but there is still no effective vaccine and no cure.
Early in the pandemic, it became clear that the risk of HIV acquisition is highly variable across humans. Socioeconomic and behavioural factors played a central role in this variability with some risk groups, such as intravenous drug users and men who have sex with men (MSM), being disproportionately affected 1 . Still, even among the most highly exposed individuals, a fraction remained HIV negative 2,3 . Similarly, important differences in the natural course of HIV infection (such as time from infection to AIDS diagnosis and the occurrence of opportunistic infections or malignancies) were only partially explained by known variables such as age and comorbidities 4 . Taken together, these clinical and epidemiological observations suggested a role for additional factors in the modulation of the individual response to HIV, including inherited variation in the genes and pathways involved in the retroviral life cycle and in innate or adaptative immunity against the infection.
HIV enters its main target cell, the CD4 + T lymphocyte, by binding to its receptor CD4 and to the co-receptor CC-chemokine receptor 5 (CCR5) 5 . This binding event triggers the fusion of the viral and human cell membranes, initiating a complex intracellular life cycle that will lead to the production of new viruses (FIg. 1). The natural immune response against HIV infection relies mostly on CD8 + T cells, also called cytotoxic T lymphocytes (CTLs). Upon primary infection, intense HIV replication results in a very high plasma viral load, measured as copies of the HIV RNA genome per millilitre of plasma, which is then partly controlled by the specific CD8 + T cell response. The very diverse human leucocyte antigen (HLA) class I molecules play a central role in this immune response by presenting small viral fragments, called epitopes, at the surface of infected cells. The recognition of these epitopes by CTL leads to the elimination of HIV-infected cells. A more efficient immune response is linked to a lower viral load during the chronic phase of an untreated infection and to slower disease progression, though it is unable to eliminate the virus 6 .
As a retrovirus, HIV can be described as a genomic pathogen. Indeed, it not only uses the molecular machinery of the infected cell for replication and dissemination but it also has the remarkable capacity to integrate a DNA copy of its RNA genome into a host cell chromosome. By becoming part of the human genome, HIV can persist in long-term cellular reservoirs for decades, making it extremely challenging to develop therapeutic strategies resulting in complete eradication 7 .
To better fight HIV infection, we must once again consider the old Delphic maxim: 'know thyself ' . Because HIV is an expert at hijacking human cells and immunity, we have no choice but to improve our understanding of our inner machinery, starting with the most fundamental layer of biological information -the human genome. The exploration of human diversity at the DNA level, long hampered by technological limitations, has been fuelled by the development of new and more powerful tools over the past decades 8 . Thanks to progress in our understanding of human genetic diversity, in genotyping and sequencing technology, as well as in bioinformatics and data science, it became possible to search for genetic factors that modulate the individual response to HIV, including the resistance and susceptibility to infection and the natural history of the disease in PLWH 9 .
In this Review, we first present an overview of the technological and conceptual developments that have fuelled HIV host genomic research. We then describe the major genetic factors modulating the natural history of HIV infection -in the HLA class I region and the CCR5 locus. Next, we highlight the recent convergence of human and HIV genomics, which allows longitudinal analyses of host-pathogen genetic interactions. Finally, we explain how genomic knowledge is poised to have a positive impact on PLWH, notably through pharmacogenomic interventions and stratification of care based on polygenic risk scores (PRS) before discussing the short-term and long-term perspectives for translational research and clinical applications of human genomics in the field of HIV.

Discovering host genetic variants
The search for human genetic differences that have an impact on HIV-related outcomes was first motivated by clinical observations, namely the striking variability in individual trajectories of patients in the absence of treatment. It was further propelled by a desire to uncover fundamental physiopathological mechanisms by the careful exploration of genomic variants and their impact on host and viral molecular processes.
Candidate gene studies. In the candidate gene approach, population-level associations are sought between HIVrelated phenotypes and specific genetic variants in genes that have been selected based on previous biological knowledge or functional work. The selected variants are typically typed using targeted genotyping assays or Sanger sequencing of the region of interest. This framework was first applied to HIV host genetics in the early 1990s in analyses of allelic variants in genes known or suspected to play a role in HIV pathogenesis or in the antiretroviral immune response. Therefore, genetic associations were reported in two broad categories: genes coding for proteins involved in the HIV life cycle (such as PPIA 10  and TLR9 (reF. 13 )) as well as in specific antiretroviral defence mechanisms (such as APOBEC3G 14 and TRIM5 (reF. 15 )). Dozens of genes were tested in multiple cohorts. Unfortunately, as has been the case in the broader field of human genetics, most reported associations turned out to be false positives, notably owing to the small sizes of the studied cohorts, population stratification and the lack of correction for multiple testing. Replication attempts in larger cohorts, where these factors could be better controlled, showed no association for the vast majority of variants [16][17][18] . In fact, only two major discoveries remain from the candidate gene era: the protective effect of a homozygous 32 bp deletion in CCR5 (CCR5Δ32) against HIV acquisition [19][20][21]  Genome-wide association studies. Advances in genotyping and sequencing technologies progressively transformed human genetic analyses during the first decade of this century. In particular, the commercial availability of genome-wide genotyping arrays marked the beginning of the era of genome-wide association studies (GWAS). The principle of a GWAS is to simultaneously test very large numbers of genetic variants throughout the genome for potential associations with a phenotype of interest 23 . This truly agnostic approach finally allowed for a more comprehensive exploration of the human genome. To date, most GWAS have been based on the genotyping of single nucleotide polymorphisms (SNPs) followed by imputation, a process that leverages the linkage disequilibrium property of the human genome to statistically infer the genotypes that are not directly measured. This approach allows nearcomprehensive testing of common variants (that is, variants with a minor allele frequency of >1%) in most human populations 24 . The first GWAS of any infectious disease focused on the level of detectable viral genetic material in the blood of untreated, chronically infected individuals during the period of HIV latency 25 . This phenotype, known as set point viral load (spVL), was selected because of its relative ease of measurement and its known correlation to the rate of progression to AIDS 26 and transmission potential 27 . The spectrum of interrogated variants was limited by early DNA genotyping arrays, yet genome-wide significant associations were identified in the HLA class I region, the most polymorphic locus in the human genome, known to have a crucial role in the modulation of T cell immunity (see 'HLA variation in HIV control' , below). These findings were soon validated and expanded by other GWAS performed in independent cohorts, which demonstrated that the genetic architecture of HIV spVL is comparable between the general population of PLWH 16,28-30 and a particular group of individuals able to maintain low viral loads for prolonged periods of time in the absence of ART, the so-called HIV controllers 17, 31 . The absence of specific genetic factors explaining the HIV controller phenotype was a disappointment in terms of potential therapeutic development.
However, it is consistent with what has been found for many complex human traits and diseases -that individuals at the extremes of the phenotypic distribution are more likely to carry multiple common variants with weak effects rather than rare, high-impact variants 32 . Beyond genotyping, a single exome sequencing study has been published so far in the HIV field 33 also indicating that rare coding variants with large effect sizes are unlikely to make a major contribution to host control of HIV infection.
GWAS were less successful in the search for determinants of HIV resistance, with no definitive evidence found of human genetic polymorphisms conferring an altered susceptibility to HIV apart from CCR5 variation 18,34 . However, recent genome sequencing studies of extreme exposure phenotypes 35 have shown promising associations in CD101, a gene encoding an immunoglobulin superfamily member implicated in regulatory T cell function 36 , and in UBE2V1, which encodes a ubiquitin-conjugating enzyme involved in pro-inflammatory cytokine expression 37,38 that associates with the HIV restriction factor TRIM5α 38 . Although both CD101 and UBE2V1 are plausible candidates, further functional studies are required to validate their role in HIV susceptibility. Finally, analyses of GWAS data provide evidence for residual heritability owing to additive genetic effects beyond CCR5 (reF. 18 ) and genetic overlap with behavioural and socioeconomic traits 39 . These results suggest that larger genomic studies of HIV acquisition may identify additional loci that impact susceptibility and warrant further investigation, potentially in large biobanks.
Several intrinsic limitations make it difficult to investigate the genetic mechanisms potentially involved in HIV resistance. For example, sample sizes are usually small (in the tens or hundreds) because studies need to be performed on highly exposed yet uninfected individuals such as patients with haemophilia exposed to HIV through contaminated blood products 34 , sex workers in hyper-endemic areas 40 or serodiscordant couples (stable heterosexual couples where one partner has HIV infection and the other is seronegative for HIV at enrolment) 29 . Frailty (or survival) bias is a limitation in cross-sectional studies of HIV cohorts with long-term follow-up, as these cohorts are enriched for genetic factors protecting against HIV disease progression. Another limitation is misclassification bias in studies comparing the genomes of patients with HIV infection to unselected controls from the general population, in which most individuals are in fact susceptible to HIV infection 18 . The identification of additional genetic determinants of individual susceptibility to HIV infection will require increased sample sizes (ideally in the thousands) as well as the use of sequencing approaches to characterize the rare functional variants that are not interrogated in studies based on genotyping arrays.

HLA variation in HIV control
The HLA locus in infectious diseases. The human major histocompatibility complex (MHC) located on chromosome 6 is one of the most genetically diverse loci in the genome 41 . The extended MHC occupies ~7.6 Mb of Human leucocyte antigen (HLA). A protein, encoded by one of a group of HLA genes, that presents antigens that train the adaptive immune response. HLA genes are highly variable and allelic variants encode proteins that are differentially able to present antigens based on the amino acid sequences in the peptide-binding grooves.

Epitopes
Parts of an antigen that make contact with a particular antibody or T cell receptor and are thus capable of stimulating an immune response.
Polygenic risk scores (PrS). Statistics that are calculated by enumerating the number of risk alleles associated with a particular phenotype (often weighted by their population-level effect sizes) that are present in a single individual and comparing the individual's score to the distribution of risk scores in the population.

Population stratification
Presence of systematic differences in allele frequencies between population subgroups owing to systematic differences in ancestry.

HIV progression
The natural disease course of HIV infection in untreated individuals, characterized by an acute phase, a chronic phase and the development of AIDS. The rate of HIV progression varies dramatically in the infected population.

HIV latency
The long-term persistence of HIV in an integrated but transcriptionally inactive form in the host genome. Because latent HIV resides in memory T cells, it persists indefinitely even in patients on suppressive antiretroviral therapy. This latent reservoir is a major barrier to curing HIV infection. Nature reviews | GEnEtICS the human genome 42 and encodes more than 400 genes, many of which are key mediators of the innate and adaptive immune responses. Within this locus, alleles at the classical class I (HLA-A, HLA-B, HLA-C) and class II (HLA-DR, HLA-DQ, HLA-DP) genes have been associated with numerous autoimmune, inflammatory and infectious diseases (reviewed in reFS 43,44 ) with recent preprints demonstrating extensive disease associations in large biobanks from multiple populations 45,46 . In the context of infectious disease, class I HLA proteins present endogenous peptides on the surface of infected cells for recognition by CTLs, triggering the development of an adaptive response. As discussed below, the variability in epitope specificity of HLA proteins and expression levels of HLA class I alleles has a dramatic impact on the progression of HIV disease.

Effects of amino acid variability encoded by HLA alleles.
An individual's genotype at class I HLA genes has been consistently demonstrated to be the major host genetic determinant of HIV spVL and rate of disease progression across geographic contexts and ancestries 17,22,47-50 . This observation was put in the genome-wide context by the first GWAS of HIV spVL 25 and HIV controllers 17 that exclusively identified SNPs in strong linkage disequilibrium with classical HLA-B alleles. Although arraybased techniques for the genotyping of DNA samples do not allow for the direct resolution of classical HLA alleles, computational methods leveraging linkage disequilibrium structure between SNPs and sequence-based HLA types in reference populations allow for accurate imputation of classical HLA types from GWAS data 51 . The application of this technique to a sample of >6,000 PLWH of European ancestry underscored the dramatic effect of HLA-B*57:01 on reducing viral load, which was, on average, ~0.8log10 RNA copies/ml lower in individuals carrying this allele 52 . This study also demonstrated strong associations at multiple other classical class I HLA alleles that had a range of effects, from decreasing spVL (B*57:01, B*27:05, B*13:02, B*14:02, C*06:02, C*08:02, C*12:02) to increasing spVL (B*07:02, B*08:01, C*07:01, C*07:02, C*04:01).
To better understand how functional variation in HLA class I proteins can impact HIV spVL, recent studies have tested variable amino acid positions within these proteins to fine-map the classical allele associations. In a GWAS performed by the International HIV Controllers study, this technique was applied to demonstrate that previously identified associations between HIV control and classical HLA alleles such as B*57:01 could be explained by variability across a small number of amino acid positions within the HLA-B protein 17 . The strongest effect was observed at position 97 of the protein, which accommodates six alternative amino acids, including valine, which is unique to B*57 haplotypes. A recent preprint describing the comprehensive analysis of the impact of HLA amino acid polymorphisms on spVL in a multi-ethnic sample of >12,000 PLWH identified three amino acid positions in HLA-B (positions 67, 97 and 156) and one in HLA-A (position 77) as independently associating with spVL 53 . The positions within HLA-B map to classical HLA alleles known to impact spVL, whereas the HLA-A position suggests that HLA-A functions independently of HLA-B. Interestingly, all four positions are located in the peptide-binding groove of the respective HLA protein, supporting the hypothesis that epitope presentation is key for the natural suppression of HIV replication. Furthermore, there was no substantial evidence that the effects of these polymorphic positions differed across ancestry groups, suggesting biological relevance across global contexts.
Several mechanisms of action have been proposed to explain why different alleles of the same HLA gene have differential effects on HIV progression. Studies of epitope specificity have shown that certain protective alleles, including B*57:01 (which uniquely carries valine at position 97) and B*27:05 (which carries the protective cysteine and asparagine residues at positions 67 and 97, respectively), drive compensatory mutations in the HIV genome leading to reduced viral fitness [54][55][56] . In addition to differential epitope specificity, the CTL effector function induced by epitope presentation has been implicated in HIV control, with CTLs in carriers of some protective HLA alleles exhibiting an enhanced proliferative capacity and more polyfunctional responses [57][58][59] .

Within-host diversity and epitope presentation.
In addition to the impact of specific class I HLA alleles on HIV progression at the population level, the within-host diversity of HLA alleles may be important at the individual level. An early study looking at the impact of allele combinations revealed that maximum heterozygosity at HLA class I genes (that is, individuals carrying two different alleles at all three class I genes) was associated with a reduced time to AIDS 47 . This observation was supported by a GWAS that showed that individuals carrying different HLA alleles at each class I gene had a significantly lower viral load than homozygous individuals, even after accounting for the additive effect at each allele 60 . This heterozygote advantage likely comes from the ability to present multiple HIV epitopes, supporting the hypothesis that the breadth of presentation is beneficial in preventing HIV progression. To further test this hypothesis, a recent in silico study used novel algorithms to predict the binding affinity of all possible 9-mer HLA-E interacts with the NKG2A receptor on the surface of natural killer (NK) cells and, when highly expressed, inhibits the killing of infected cells.

Genetic architecture
Underlying genetic basis of a given trait, in terms of variant number, effect size, allele frequency and interactions.

HIV controllers
A group of people living with HIV whose plasma HIV rNA load is spontaneously maintained at very low levels for several years (usually at least 3-5 years) in the absence of antiretroviral therapy.

Restriction factor
A host cellular protein that participates in antiviral defence by interfering with specific steps of the viral replication cycle. peptides in the HIV proteome to HLA proteins encoded by the different class I alleles 61 . Coupling these predicted affinities to clinical and genetic data demonstrated that spVL was negatively correlated with the breadth of the peptide repertoire bound by an individual's HLA protein isoforms. Moreover, HLA-B isoforms had the largest predicted breadth of epitope recognition and conferred the strongest reduction of viral load (FIg. 2a). However, the quantity of epitopes alone is unlikely to fully explain the protective capacity of an individual's HLA alleles, as HLA-C Nature reviews | GEnEtICS subsets of epitopes that are uniquely presented by protective HLA isoforms explained more of the observed variance in spVL than the entire predicted set. This observation is further supported by an in silico and functional study that demonstrated that HIV epitopes that encode structurally important residues are preferentially targeted by protective HLA isoforms and associate with elite control of replication 62 . Thus, the quantity and quality of HIV epitopes presented by combinations of HLA isoforms are the key drivers of spVL.
Non-classical effects of HLA variation. In addition to the classical effects of HLA genes on peptide presentation, several studies have suggested that non-classical effects may play a part in limiting HIV replication in vivo. In particular, the variable expression levels of classical HLA-C alleles have been linked to HIV control, with those expressed at high levels conferring protection against disease progression 63 . This effect has been observed across ancestries and has been linked to the absence of a variable microRNA-148a (miR-148a) binding site in the 3′ untranslated region of HLA-C 64 . The proposed model suggests that mRNA from alleles lacking the miR-148a binding site escape suppression by miR-148a; as a consequence, proteins encoded by these alleles and loaded with HIV epitopes are expressed at higher levels on infected cells, allowing for greater rates of detection by CTLs 64 (FIg. 2b). Similarly, proteins encoded by HLA-A alleles are also expressed at variable levels on the cell surface 65 . However, in contrast to HLA-C, HLA-A alleles expressed at high levels associate with poorer control of viral replication and with faster disease progression 66 . A combination of genetic and functional studies indicated that increased HLA-A expression levels correlated with higher viraemia in a combined cohort of more than 9,000 PLWH from sub-Saharan Africa and the United States. It was proposed that this effect may be the result of enhanced production of the HLA class I signal peptide that regulates HLA-E expression, a hypothesis that was supported by a correlation between HLA-A expression and HLA-E expression among 58 healthy donors tested 66 . HLA-E is a ligand for natural killer group protein 2A (NKG2A) and their interaction results in strong inhibition of natural killer (NK) cell degranulation (FIg. 2c). Thus, the enhanced production of the HLA class I signal peptide in individuals carrying highly expressing HLA-A alleles may lead to enhanced inhibition of immune responses in infected individuals, resulting in poorer clinical outcomes.
Finally, it has also been observed that the combination of HLA genotype and the expression of particular killer cell immunoglobulin-like receptor (KIR) proteins variably modulated HIV disease course 67 . The KIR proteins are a highly variable set of cell-surface receptors expressed on NK cells (and some T cells) that, when engaged by their cognate receptors, either activate or inhibit NK cell-mediated killing (recently reviewed in reF. 68 ). In particular, the combination of the activating KIR3DS1 allele with a set of HLA-B alleles that carry isoleucine in the Bw4 epitope (Bw4-I80) is highly associated with HIV control 69 . Taken together, these results demonstrate the complex interplay between epitope presentation, HLA protein expression and NK inhibition.

CCR5 variation in HIV infection CCR5Δ32 and resistance against HIV infection.
Perhaps the most highly touted example of human genetic variability restricting infectious diseases is the observation that individuals carrying two copies of a loss-of-function variant in the gene encoding the cell receptor CCR5 are highly resistant to infection by HIV. CCR5 is a chemokine receptor expressed on the surface of multiple subsets of monocytes and lymphocytes, including CD4 + T cells, the major HIV target cells. At the earliest stages of infection, the HIV envelope protein gp120 binds CD4 and CCR5 on the cell surface, resulting in fusion of the viral and host cell membranes and in the release of the viral genome into the target cell. The discovery that individuals who carry homozygous loss-of-function alleles at CCR5 are resistant to infection was first made in a group of MSM that were multiply exposed to the virus but remained uninfected 19 . It was determined that these men all shared a 32-bp deletion in the CCR5 gene (the CCR5Δ32 allele) that leads to the production of a nonfunctional protein and the absence of functional CCR5 on the cell surface prevents HIV from entering target cells (FIg. 3a). The CCR5Δ32 allele is observed at ~10% frequency in individuals of European ancestry (homozygosity occurs at a frequency of 1%), at a reduced frequency in southern Europeans compared to those in the north 70 and is not observed at an appreciable frequency in other continental populations. Compound heterozygotes (that is, individuals carrying one copy of CCR5Δ32 and a second loss-of-function CCR5 variant) are also resistant to infection, although these individuals are exceedingly rare 71 .
The observation that individuals lacking CCR5 expression are resistant to HIV infection directly led to the development of the antiviral drug Maraviroc, a CCR5 antagonist 72 , as well as to the world's first ethically fraught attempt at human embryo engineering 73 . Perhaps most interestingly, bone marrow transplants between CCR5Δ32 homozygous donors and recipients with HIV infection have resulted in the only two confirmed cases of long-term HIV cure 74,75 . Although promising, this effect has been difficult to replicate in engineered autologous stem cell models 76 and is unlikely to be scalable to the level necessary to stem the pandemic. Additionally, the protection is not absolute, as several confirmed cases of infection in CCR5Δ32 homozygotes have been reported (reviewed in reF. 77 ), presumably by viruses that utilize the minor co-receptor CXCR4 or by dual-tropic viruses.

Associations between CCR5 variation and spontaneous HIV control.
In addition to the impact of homozygosity on preventing infection, individuals with a single CCR5Δ32 copy exhibit lower spVL and delayed disease progression compared to those with two functional copies 19,20,78 , likely because the reduced levels of CCR5 protein on the cell surface lower the efficiency of HIV entry into target cells (FIg. 3a). The CCR5 locus was also identified in GWAS, first in a study of ~2,500 PLWH in Europe 16 and then in an expanded set of 6,300 individuals from across the globe 52 . However, the CCR5Δ32 allele was not directly assayed on the genotyping Killer immunoglobulin-like receptor (KIr). A family of highly polymorphic activating and inhibitory receptors that serve as key regulators of human natural killer cell function.

HIV target cells
The cells primarily infected by HIV, namely CD4 + T cells and macrophages, both of which are key components of a healthy immune system. www.nature.com/nrg platforms used in these studies, thus only proxy SNPs were identified. In a combined analysis of GWAS data and direct CCR5Δ32 genotyping, it was observed that the CCR5Δ32 allele was not the most strongly associated variant in the region, suggesting that multiple independent genetic effects occur at this locus. Conditional analysis accounting for the effect of CCR5Δ32 showed that an additional marker, rs1015164, was also strongly associated with spVL. Functional analysis of this variant showed that it regulates the expression of an antisense long non-coding RNA called CCR5-AS, which overlaps the CCR5 gene 79 . This study further showed that the increased expression of CCR5-AS resulted in increased CCR5 expression because CCR5-AS interfered with the rALY-mediated degradation of CCR5 mRNA. Moreover, the knockdown of CCR5-AS reduced the susceptibility of CD4 + T cells to HIV-1 infection ex vivo (FIg. 3b). These results demonstrate that the clinical course of untreated HIV infection is directly influenced by the innate level of CCR5 expression within the infected individual. Whether additional functional polymorphisms in CCR5 have similar effects remains an open question.

Host and pathogen genetic variation Pathogen sequence variation as an indicator of host genetic pressure.
Most studies performed so far in the field of host genetics focused on clinically defined outcomes such as the susceptibility to infection or disease progression. However, intermediate phenotypes have been shown to be very valuable in identifying subtle genetic association signals that are not always detectable using more complex clinical outcomes. A particularly promising intermediate phenotype, unique by its nature to infectious diseases, is variation in the pathogen genome (FIg. 4). HIV is a highly variable virus that establishes a lifelong infection. Therefore, it represents

CCR5 expression
High Low None

RALY-mediated degradation
A mechanism in which the rALY protein binds to the 3′ untranslated region of an mrNA to promote its degradation.
an ideal model to search for the potential effects of intrahost selective pressure on a human pathogen. While some of the variants observed in the HIV genomic sequence are present in the transmitted/founder virus, another fraction is acquired during the course of the disease resulting, at least partially, from the selective pressure exerted by the host response to infection. Signs of host-driven selection are clearly visible in the HIV genome. In particular, specific variants have been described in key viral epitopes presented by HLA class I molecules and targeted by CTL responses 80 . Mutations have also been reported in regions targeted by KIR, suggesting the escape from immune pressure by NK cells 81,82 . A non-negligible fraction of the HIV-1 genome (~12%) is under positive selection but only about half of the positively selected sites map to canonical CD8 + T cell epitopes 83 , indicating that additional host factors could be driving evolution in non-epitope sites.

Genome-to-genome studies.
Computational approaches developed over the past decade have allowed more comprehensive analyses of the reciprocal genetic signals resulting from the host-pathogen interaction 84,85 . Joint analyses of human and HIV sequence variation start with the generation of large-scale genomic data from paired samples. The retroviral genome can be Control for human stratification, for example, using PC analysis

Association studies accounting for host and pathogen stratification
Control for viral stratification, for example, using a phylogenetic-based approach  www.nature.com/nrg isolated and sequenced either as native RNA during replicative infection or as proviral DNA, integrated into the host genome, during latent infection. Human genomic information can be obtained using genotyping or sequencing technology. The principle of genome-togenome (g2g) studies is then to perform a systematic search for associations between human genetic polymorphisms and viral sequence variants, at the nucleotide or amino acid levels. Because of the very large number of models run in parallel -one GWAS for each viral variant -this approach requires stringent correction for multiple testing. By mapping all interacting loci, G2G studies have the potential to uncover the most important genes and pathways involved in specific responses to infectious agents, thereby revealing novel diagnostic or therapeutic targets. In addition to identifying the sites of genetic interplay between virus and host, this study design makes it possible to estimate the biological consequences of such interactions and to estimate the relative impact of human and viral genetic variation on phenotypic outcomes by assessing associations between human-driven escape at viral sites and a quantitative clinical phenotype. In spite of these promises, it must be acknowledged that G2G studies have not led, as of today, to the identification of novel HIV restriction factors in the human genome 86 . Future studies will require larger sample sizes to increase power but also more diversity with a strong focus on the inclusion of PLWH of non-European ancestries. Nevertheless, studies based on the combined analysis of host and pathogen genomic variation have already demonstrated their potential in other infections. In particular, the use of a similar study design in chronic hepatitis C virus infection highlighted the evolutionary pressure exerted by both innate (interferon-λ) and acquired (HLA class II) immune defence mechanisms 87,88 . The intra-host evolution of DNA viruses can also be investigated using a G2G approach, as shown in a recent study that revealed several associations between human and Epstein-Barr virus sequence variation in immunosuppressed PLWH 89 .  91 , where 90% of infected people know their status, 90% of those are on antiviral therapy and 90% of those are suppressing the virus below the level of detection. This aspirational treatment target would practically mean, given currently available technologies, that more than 34 million people would be on lifelong chemotherapy. Although this treatment as prevention approach would undoubtedly result in decreases in transmission and dramatic increases in life expectancy for the population with HIV infection, it also requires a deeper understanding of how human genetic variation relates to variability in drug toxicity and response to long-term therapy.

HIV precision medicine
HIV pharmacogenetics. In addition to affecting HIV disease progression in untreated individuals, human genetic variability has also been implicated in modifying the response to treatment. A major achievement in the fight against HIV has been the development of multiple, effective therapeutics that target several stages of the viral life cycle. These include entry inhibitors, which prevent the binding of the viral spike protein gp120 to host cell receptors and fusion of the virus with host cell membranes; nucleoside and non-nucleoside reverse transcriptase inhibitors, which prevent the reverse transcription of the viral RNA genome into DNA; integrase inhibitors, which prevent the integration of the viral DNA product into the host genome; and protease inhibitors, which prevent the cleavage of viral polyproteins into their functional subunits (FIg. 5). For several classes of anti-HIV therapy, human genetic variability is known to influence response to the drug, which in some cases leads to severe adverse events and treatment discontinuation 92 . Paradoxically, the HLA-B allele B*57:01, most notably associated with the control of infection, also predisposes carriers to a severe hypersensitivity reaction to the  101 ) have all been associated with slow metabolization kinetics of their cognate drugs (TABLe 1), in some cases leading to drug accumulation in the brain, psychiatric complications and treatment stoppage 99 . The frequency of many of these polymorphisms varies depending on ancestral background, leading to reduced drug tolerance and therefore reduced efficacy in some populations. For example, the allele CYP2B6*6 (rs3745274), which results in the slow metabolism of efavirenz and nevirapine, two non-nucleoside reverse transcriptase inhibitors recommended for firstline use by WHO until recently, has an approximately twofold higher frequency in some African populations compared to Europeans 102 . This increased frequency and the resulting adverse events led to thousands of cases of treatment discontinuation in Zimbabwe when the nation adopted a single-pill efavirenz-containing regimen 103 . This example highlights the need to not only tailor the therapy to the individual but, in some cases, to the population as well. Newer generations of HIV therapies, such as integrase inhibitors and advanced nucleoside reverse transcriptase inhibitors, have more favourable pharmacokinetic and safety profiles 104 . However, the effects of long-term treatment with these drugs and any potential interactions with human genetic variability remain to be understood.

Complex trait genomics in HIV medicine.
In addition to the direct interactions between host genotype and drug metabolism, patients on long-term HIV therapy also experience early onset of several chronic diseases, including cardiovascular disease [105][106][107] , metabolic syndrome 108 , kidney disease 109,110 and liver fibrosis 111 . These conditions are all known to have high heritability in the HIV uninfected population and genetic risk factors for type 2 diabetes mellitus 112 and cardiovascular disease 113 have been shown to be enhanced in PLWH on therapy. Recently, there has been a push to develop PRS in the general population. These scores, built by summing the additive effects of dozens to thousands of genetic variants within an individual, have been shown to have a strong predictive ability for multiple metabolic, inflammatory, tumoural and cardiovascular conditions 32 . Investigations of PRS in the specific context of HIV infected individuals receiving long-term antiretro viral therapy have just begun, with the recent demonstrations that the prediction of chronic kidney disease can be improved through the addition of a PRS to the known clinical and pharmacological risk factors 114,115 and that a PRS can be useful to stratify PLWH at a high risk of cardiometabolic diseases who may benefit from preventive therapies 116 . An important caveat is that PRS are not necessarily transferable across ancestral groups and, as in all areas of genomics, attention should be paid to enhancing diversity and ensuring equity in precision medicine approaches.

Conclusion and future perspectives
Host genomic studies have advanced our understanding of HIV biology in several important ways. Firstly, the demonstration of the dominant impact of HLA variation on HIV progression in the context of the whole genome reinforced the need to focus on T cell responses in vaccine design. Moreover, the ability to accurately infer HLA allele types and protein-level variability from genotyping array data, an approach first piloted in HIV genomic studies, has greatly increased our understanding of how amino acid variability in HLA molecules contributes to multiple medically important traits. Secondly, dense genotyping and large sample sizes enabled the discovery of multiple, independent signals in the CCR5 locus, which provided a deeper understanding of how the expression of CCR5 is regulated and how it modulates HIV infection beyond the known impact of the CCR5Δ32 allele. Finally, amassing genome-wide data for large cohorts of PLWH has enabled the validity of previous candidate gene associations to be assessed, providing a new standard for identifying novel loci of HIV restriction. In recent years, there have been several barriers to further advancing our understanding of how host genomics affects HIV susceptibility and progression. Firstly, current studies have predominantly included individuals of European ancestry, mirroring the lack of diversity in genomics in general 117 , which is particularly problematic because the vast majority of PLWH are non-White. The example of the population-specific CCR5Δ32 allele further highlights the need to stretch beyond European cohorts to determine if other population-specific effects may exist. Attaining the large sample sizes required for genomic discovery in non-European populations will require a substantial investment of resources and building of capacity in low-income and middle-income countries. Furthermore, understanding the potential function of genetic variants identified in diverse samples will require a shift towards inclusivity across genomics databases 118 . Secondly, with improvements in HIV care and broad adoption of test and treat strategies, the focus of host genomics studies has necessarily shifted away from the natural history of infection phenotypes to intermediate phenotypes, pharmacogenomics of long-term therapy, comorbidities or vaccine response. Thirdly, understanding other classes of genetic variation that are not well captured by genotyping arrays, for example, diversity of KIR alleles and T cell receptor usage, the other partner in the HLA interaction, should be investigated to better understand how genetic variation in key innate and adaptive immune genes impact disease outcomes. However, capturing these types of variation requires in-depth sequencing to resolve genetic diversity and, in the case of T cell receptor variation, targeted immune assays to capture the relevant cells. Progress on computational methods for inferring variation at some complex loci from genotyping array data 51,119 or next-generation sequencing data 120-122 will greatly aid these efforts. The full translational potential of host genomics discovery in HIV has yet to be realised. Although the association between HLA allele type, epitope binding and HIV control have been well established, this knowledge has yet to be translated into an effective preventative or therapeutic vaccine. As mentioned above, treatment of PLWH with CCR5-deficient cells has shown potential as an HIV cure but several technological improvements in autologous cell editing will be required before it becomes a scalable strategy. In addition to targeting host genes for editing, in vitro studies have also shown that it is feasible to directly target and excise the integrated proviral genome 123,124 . Although an extremely promising strategy, the delivery of the necessary machinery to latently infected cells remains a challenge.
The host genomics approach established in HIV research has since been applied to several other infectious diseases, including those posing substantial threats to human health such as hepatitis C virus 125,126 , tuberculosis 127 , malaria 128 and even SARS-CoV-2 (reF. 129 ), among others. These studies have time and again uncovered novel therapeutic targets and mechanisms to identify the individuals who are most vulnerable to specific infections. As the world struggles with a novel pandemic-causing RNA virus, the lessons we can learn from how the human genome contributes to variability in outcome have never been more important.
Published online 24 June 2021