Main

HIV-1 arose through several independent zoonotic transmissions of simian immunodeficiency viruses during the past century1,2,3. Today, HIV-1, along with its less widespread cousin HIV-2, infects more than 30 million people worldwide. Both viruses belong to the Retroviridae, a viral family that has left numerous scars of ancient infections in mammalian genomes; indeed, derelict retroviral sequences constitute as much as 8% of our 'own' DNA4. The evolutionary success of this family is in contrast to its deceptive simplicity: HIV-1 can persistently infect humans by subverting the innate and adaptive immune systems, despite encoding only 15 mature proteins. Viral replication at the cellular level proceeds through a series of steps that starts when a virus productively engages cell surface receptors and ends when nascent particles mature into infectious virions (Fig. 1). During this process, HIV-1 exploits a myriad of cellular factors to replicate, whereas host restriction factors fight to suppress this replication5,6. The mainstream highly active antiretroviral therapy (HAART) drug cocktails that are primarily used to target the reverse transcriptase (RT) and protease (PR) enzymes potently suppress viral loads and transmission rates, but complications can arise from compound toxicity and the emergence of resistant strains (Box 1). Advances in structural biology can aid the development of next-generation compounds that are active against previously exploited targets, and can also help define new drug targets and boost the effectiveness of vaccination strategies. This Review proceeds stepwise through the HIV-1 replication cycle, highlighting the impact that major structural biology advances have had on our understanding of viral growth and on the development of new antiretroviral therapies.

Figure 1: Schematic overview of the HIV-1 replication cycle.
figure 1

Those host proteins that have a role in the replication cycle and are discussed in the text are indicated. The infection begins when the envelope (Env) glycoprotein spikes engage the receptor CD4 and the membrane-spanning co-receptor CC-chemokine receptor 5 (CCR5) (step 1), leading to fusion of the viral and cellular membranes and entry of the viral particle into the cell (step 2). Partial core shell uncoating (step 3) facilitates reverse transcription (step 4), which in turn yields the pre-integration complex (PIC). Following import into the cell nucleus (step 5), PIC-associated integrase orchestrates the formation of the integrated provirus, aided by the host chromatin-binding protein lens epithelium-derived growth factor (LEDGF) (step 6). Proviral transcription (step 7), mediated by host RNA polymerase II (RNA Pol II) and positive transcription elongation factor b (P-TEFb), yields viral mRNAs of different sizes, the larger of which require energy-dependent export to leave the nucleus via host protein CRM1 (step 8). mRNAs serve as templates for protein production (step 9), and genome-length RNA is incorporated into viral particles with protein components (step 10). Viral-particle budding (step 11) and release (step 12) from the cell is mediated by ESCRT (endosomal sorting complex required for transport) complexes and ALIX and is accompanied or soon followed by protease-mediated maturation (step 13) to create an infectious viral particle. Each step in the HIV-1 life cycle is a potential target for antiviral intervention165; the sites of action of clinical inhibitors (white boxes) and cellular restriction factors (blue boxes) are indicated. INSTI, integrase strand transfer inhibitor; LTR, long terminal repeat; NNRTI, non-nucleoside reverse transcriptase inhibitor; NRTI, nucleoside reverse transcriptase inhibitor.

Viral entry

The HIV-1 envelope spikes comprise trimers of non-covalently linked heterodimers consisting of the surface glycoprotein gp120 and the transmembrane glycoprotein gp41 (Refs 7, 8, 9). When triggered, these spikes initiate a cascade of conformational changes that culminates in fusion between the viral and host cell membranes and release of the viral core into the cytoplasm. HIV-1 primarily infects CD4+ T cells and macrophages. An initial interaction between gp120 and the surface receptor CD4 induces the formation of a bridging sheet between the inner and outer domains of the gp120 monomer, exposing the binding site for a second cell surface molecule, typically CC-chemokine receptor 5 (CCR5)10,11,12 (Fig. 1, step 1). Engagement of this co-receptor leads to insertion of the fusion peptide, located at the amino terminus of gp41, into the cell membrane. This event triggers significant rearrangements of the trimerized amino- and carboxy-terminal heptad repeat sequences within gp41, the formation of a six-helix hairpin structure and the apposition and fusion of the viral and host cell membranes13,14,15 (Fig. 1, step 2).

Initial cryo-electron tomography studies provided crucial glimpses of the HIV-1 envelope and its associated conformational flexibility7,8, although the low-resolution models that were generated left many key aspects of the native structure unresolved9,16,17. Higher-resolution crystallographic studies using engineered HIV-1 glycoprotein constructs have been instrumental in developing entry inhibitors and elucidating the mechanistic basis of virus neutralization by antibodies. Recent studies have highlighted the striking flexibility of the core gp120 structure, which allows extreme conformational changes following CD4 engagement without destabilizing the interaction with gp41 (Refs 12, 18). CD4 binds gp120 at a depression formed between the inner and outer domains, where the CD4 residue Phe43 partially fills a hydrophobic cavity10 (Fig. 2a). Small molecules designed to bind and extend further into this pocket display antiviral activity; thus, increasing the affinity of such molecules for gp120 might lead to the development of clinically useful inhibitors19.

Figure 2: Binding of CD4 and a CD4-mimicking antibody to the gp120 core.
figure 2

a | The structure of the HIV-1 glycoprotein gp120 in complex with cellular CD4 (Protein Data Bank (PDB) accession 3JWD). Only immunoglobulin-like domain 1 (D1) of CD4 is shown; the Phe43 side chain is depicted as sticks. b | The VRC01 antibody–gp120 co-crystal structure (PDB accession 3NGB), oriented as in part a. Only the variable domains of the heavy (VH) and light (VL) chains of the antibody are shown.

Most antibodies directed against gp120 are strain specific and, moreover, fail to neutralize the virus. However, several groups recently described patient-derived gp120-reactive antibodies with broad HIV-1 neutralization activity20,21,22,23,24. One group in particular took a structure-based approach to stabilize the CD4-bound conformation of gp120 using disulphide bonds, and redesigned its surface to mask positions that are exterior to the CD4-binding site21,22. Using one such construct as bait and peripheral mononuclear cells from patients with AIDS, they isolated B cell clones that produced antibodies with remarkably broad neutralizing activity. Structural characterization of these antibodies revealed that, when bound to gp120, the heavy chains of the immunoglobulins mimic CD4 (Fig. 2a,b), with their epitopes almost precisely overlapping the primary CD4-binding site on gp120 (Refs 22, 25). These results define the structural basis for HIV-1 neutralization by antibodies that engage the CD4-binding site. Interestingly, immunoglobulins isolated from the sera of different donors using the resurfaced gp120 construct were derived from the same precursor heavy-chain gene (IGHV1-2*02), which had subsequently undergone extensive affinity maturation21,22,25. The requirement for extensive somatic mutation to achieve viral neutralization21,22 might pose a challenge for the experimental elicitation of such antibodies. However, the recent discovery of highly potent gp120-binding antibodies with alternative modes of action suggests that there are multiple genetic pathways to achieve cross-clade HIV-1 neutralization20,23,24. These results should encourage attempts to design immunogens that elicit humoral immunity for vaccination purposes.

Peptides derived from gp41 N-terminal26 or C-terminal27 sequences, which disrupt formation of the six-helix bundle and hence membrane fusion, possess potent antiviral activity. Enfuvirtide, a peptide based on the C-terminal sequence, was licensed as Fuzeon (Roche) in 2003, although the requirement for twice-daily injections combined with the frequent appearance of resistance mutations in gp41 have limited its utility. D-peptides that target a pocket at the base of the gp41 N-terminal helical structure are also potent antivirals and may overcome some of the limitations associated with Fuzeon use28.

Post-entry events: uncoating to integration

The HIV core, which houses the replication enzymes RT and integrase (IN) as well as the viral genomic RNA, is encased by a cone-shaped shell29 composed of the viral capsid (CA) protein. Recent work has revealed the interactions that occur among individual CA molecules and underlie the structural integrity and functionality of the protective shell30,31,32.

Uncoating. Partial CA shell dissolution, which is required for reverse transcription33,34, is a recently verified therapeutic target35 (Fig. 1, step 3). Moreover, the underlying features of the assembled shell seem to determine its propensity to uncoat32. CA, which comprises independently folded N-terminal and C-terminal domains (NTD and CTD, respectively) connected by a flexible linker36,37, can assemble into ring structures containing five or six protomers31,32 (Fig. 3a,b). The rings further congregate to form a fullerene-like cone that is composed predominantly of hexamers, but also contains seven pentamers at the wide end and five at the narrow end. This arrangement produces shape declinations33,38 (Fig. 3c), and the flexibility of intramolecular NTD–CTD and intermolecular CTD–CTD interactions further contributes to the curvature of the shell lattice30,32 (Fig. 3a,b). The high concentration of pentameric declinations that is expected at the narrow end of the cone may also serve to initiate uncoating32.

Figure 3: HIV-1 capsid structures.
figure 3

a | The crystal structure of the hexameric full-length HIV-1 capsid (CA) protein assembly (Protein Data Bank (PDB) accession 3H47). Individual subunits are coloured by chain, with the amino- and carboxy-terminal domains (NTD and CTD, respectively) of each subunit indicated. b | The crystal structure of the pentameric full-length HIV-1 CA assembly (PDB accession 3P05). c | Stereo view of the model for the complete HIV-1 capsid, based on the crystal structures32. NTDs of the hexameric and pentameric CA units are shown in blue and yellow, respectively; CTDs are green. d | The HIV-1 CA NTD in complex with PF-3450074 (PDB accession 2XDE). The orientation is a 100° rotation compared with the blue NTD in part a. Residues that are crucial for PF-3450074 binding, as revealed by resistance mutations44, are indicated.

TRIM5α, a potent HIV-1 restriction factor isolated from rhesus macaques39, recognizes the assembled CA structure to accelerate uncoating40 and activate innate immune signalling pathways41. A replacement of the N-terminal RING domain of rhesus TRIM5α with that from the related human protein TRIM21 yielded a chimaera that was amenable to recombinant techniques42. The hybrid construct formed two-dimensional hexameric crystalline arrays in the presence of a higher-order six-fold lattice of HIV-1 CA43. Such CA-templated multimerization may underlie functional HIV-1 restriction by rhesus TRIM5α through a pattern recognition mechanism, a common feature of other components of the innate immune system41. Stimulation of premature uncoating could also be a useful therapeutic approach; for example, PF-3450074, a small-molecule inhibitor of HIV-1 replication that binds to a pocket within the NTD of CA (Fig. 3d), may work by triggering premature uncoating through destabilization of CA–CA interactions35,44.

Viral DNA synthesis. Reverse transcription and integration of the resultant linear viral DNA molecule into a host cell chromosome occur within the context of nucleoprotein complex structures that are derived from the viral core (Fig. 1, steps 4–6). High-resolution HIV-1 RT structures have been available for a number of years, with initial drug- and nucleic acid template-bound crystal structures reported nearly two decades ago45,46.

HIV-1 RT is a heterodimer composed of p66 and p51 subunits, with p66 harbouring two functional active sites: an N-terminal RNA- and DNA-dependent DNA polymerase and a C-terminal RNase H that digests the RNA component of RNA–DNA hybrids. The polymerase domain resembles a right hand with four subdomains: fingers, thumb, palm and connection45,46,47,48 (Fig. 4a). During DNA polymerization, Mg2+ cations coordinated by the catalytic residues Asp110, Asp185 and Asp186 from the palm subdomain activate the DNA primer 3′-hydroxyl group and stabilize the hypothetical pentavalent α-phosphorus intermediate state within the substrate 2′-deoxyribonucleoside 5′-triphosphate (dNTP), incorporating the nucleotide into the growing DNA chain and liberating free pyrophosphate48 (Fig. 4b).

Figure 4: Structural analyses of HIV-1 reverse transcriptase function and its inhibition by small molecules.
figure 4

a | Overview of the HIV-1 reverse transcriptase (RT)–template–primer complex (Protein Data Bank (PDB) accession 1RTD). The subdomains of the active RT subunit are indicated (the fingers, thumb, palm and connection domains of the amino-terminal polymerase, and the RNase H domain at the carboxyl terminus); p51 is the inactive RT subunit. The structure contains a bound molecule of dTTP (shown as sticks) in the active site. Grey spheres are Mg2+ ions. b | Close-up of the RT active site (PDB accession as in part a) and DNA polymerization. The 3′-hydroxyl group, absent in the original structure48, is added for illustration purposes. The direction of nucleophilic attack is indicated by a dashed arrow. The primer, dTTP, Met184 (mutation of which results in resistance to oxathiolane-containing inhibitors), the catalytic residues and the leaving pyrophosphate group (PPi) are shown as sticks. RT chains are coloured as in part a. c | Stereo view of the ATP-binding pocket in 3′-azido-3′-deoxythymidine (AZT)-resistant HIV-1 RT (PDB accession 3KLE). The excision product (AZT–adenosine tetraphosphate (AZTppppA′)) is shown as sticks, with carbon atoms in grey. Protein chains are semitransparent surfaces (colouring as in part a); residues implicated in AZT resistance are indicated. d | Stereo view of TMC-278 (rilpivirine; shown as sticks with carbon atoms in grey) bound to HIV-1 RT (PDB accession 2ZD1). RT residues forming the binding pocket for the non-nucleoside RT inhibitor are indicated.

Two classes of antiviral drug, nucleoside and non-nucleoside RT inhibitors (NRTIs and NNRTIs, respectively), inhibit DNA polymerization and are core components of HAART (Box 1). Following phosphorylation in infected cells, NRTIs mimic natural dNTPs and are incorporated into the viral DNA by RT. Because they lack the 3′-hydroxyl group that is needed for incorporation of the subsequent nucleotide, NRTIs act as chain terminators, and viral resistance to some of these small molecules occurs through drug exclusion mechanisms. For instance, mutations of Met184 (to Val or Ile) selectively preclude the binding of oxathiolane-containing inhibitors such as 3TC (2′,3′-dideoxy-3′-thiacytidine) over dNTPs with normal deoxyribose rings48,49 (Fig. 4b). However, resistance to 3′-azido-3′-deoxythymidine (AZT) and other thymidine analogues puzzled researchers for some time: inexplicably, the mutant RT from AZT-resistant virus strains efficiently incorporates AZT monophosphate into the viral DNA50. Instead of preventing incorporation, the mutant enzyme developed the ability to excise the incorporated drug from the primer strand. Remarkably, RT accomplishes this by utilizing ATP as a pyrophosphate donor to excise the incorporated drug in the form of an AZT–adenosine tetraphosphate adduct, regenerating an active 3′-hydroxyl primer terminus in a reaction that is mechanistically equivalent to the reversal of the polymerization step51,52. Recent structural analyses revealed that the AZT resistance mutations Lys70Arg, Thr215Tyr and Lys219Gln create an optimal ATP-binding site between the fingers and palm subdomains of RT to promote the excision reaction53 (Fig. 4c).

NNRTIs are allosteric inhibitors that induce the formation of a flexible binding pocket through large conformational changes involving Tyr181, Tyr188 and the primer grip (residues 227–235 within the palm subdomain)45,54,55 (Fig. 4d). The mechanistic basis of inhibition may be due to displacement of the primer grip56 or the three-stranded β-sheet that contains the catalytic triad55,57. Stacking interactions between the aromatic side chains of Tyr181 and Tyr188 and first-generation NNRTIs such as nevirapine contribute considerably to drug binding45, whereas the associated mutations confer resistance as a result of loss of aromatic chemistry58. Lys103Asn is also widely associated with NNRTI resistance, and the Asn103–Tyr188 interaction in the mutant RT seems to restrict the movement of Tyr188 that is required for drug binding59,60. The more recently developed diarylpyrimidine NNRTIs, TMC-125 (also known as etravirine) and TMC-278 (also known as rilpivirine), retain potency in the face of first-generation NNRTI resistance mutations because their inherent flexibility contributes substantially to high-affinity binding to the mutant RT61 (Fig. 4d).

Reverse transcription is inhibited by the cellular restriction factor APOBEC3G, a virion-incorporated cytidine deaminase that impedes elongation62,63 and converts nascent cytidines in viral cDNA to uracils64,65,66. In response, HIV-1 deploys a countermeasure, the protein Vif, which antagonizes the incorporation of APOBEC3G by binding to it and inducing its degradation in virus producer cells67,68. Such observations highlight the importance of the Vif–APOBEC3G nexus for antiviral drug development, and small molecules that limit Vif-mediated degradation of APOBEC3G, and inhibit HIV-1 infection, have been described69,70.

APOBEC3G harbours two related domains, each containing cytidine deaminase motifs; the NTD mediates virion incorporation, whereas the CTD is a functional deaminase71,72,73. Several structures of the CTD, derived from NMR74,75,76 and X-ray crystallography77,78, revealed a five-stranded β-sheet intermixed with six helices, with conserved elements of the catalytic zinc coordination motif — (H/C)XEX23–28PCX2C — contributed by a pair of α-helices. These results afford important glimpses into the mechanism of HIV deamination, although additional structures that incorporate the NTD and especially the single-stranded DNA substrate will reveal a more complete picture of catalysis. Structures that include Vif should further aid the development of novel antiviral compounds.

Integration. The viral enzyme IN possesses two catalytic activities: 3′ processing and DNA strand transfer. Each end of the HIV-1 DNA long terminal repeat (LTR) is cleaved adjacent to the invariant dinucleotide sequence d(C-A), unveiling recessed 3′ termini. IN then uses the 3′-hydroxyls to cut chromosomal DNA strands across a major groove while joining the viral DNA ends to the target DNA 5′-phosphates. Host enzymes complete the integration process by repairing the single-strand gaps abutting the unjoined viral DNA 5′ ends, resulting in establishment of a stable provirus (Fig. 1, step 6). IN-mediated reversal of integration is impossible, although rare instances of cell-mediated homologous recombination across the LTRs can excise proviral DNA, leaving a single copy of the LTR behind79. Site-specific recombinases can be similarly engineered to excise the HIV-1 provirus ex vivo80, although such approaches appear to be far from clinical application.

Although crystal and NMR structures of various fragments of HIV-1 IN were reported over several years81, detailed views of the functional IN–viral DNA nucleoprotein complex, called the intasome, were lacking until recently. Given that clinically useful HIV-1 IN inhibitors selectively interact with the intasome rather than with free IN82, this dearth of structural information limited drug development. Recent successes are due to the application of X-ray crystallography to the tractable intasome of the prototype foamy virus (PFV), a member of the retroviral genus Spumavirus83,84. An overview of these advances is given here; for in-depth reviews, see Refs 85, 86.

The intasome contains a dimer-of-dimers of IN, with only one subunit of each dimer binding a viral DNA end83 (Fig. 5a,b). Thus, akin to RT, functional IN active sites are delegated to a subset of protein molecules within the multimeric complex. The intasome accommodates the target DNA within a cleft between the functional active sites, in a severely bent conformation (Fig. 5b,c). This contortion in the target DNA allows the intasome active sites (which are separated from one another by as much as 26.5 Å) to access their target scissile phosphodiester bonds84. The Asp and Glu residues of the catalytic motif D,DX35E coordinate two divalent metal ions, which activate the 3′-hydroxyl nucleophile and destabilize the target phosphodiester bond during strand transfer83,84 (Fig. 5c). Reversal of the reaction appears to be restricted by a conformational change that causes a 2.3 Å displacement of the newly formed viral DNA–target DNA phosphodiester bond from the IN active site following transesterification84.

Figure 5: Retroviral intasome structures and mechanism of integrase catalysis.
figure 5

a | Overview of the prototype foamy virus (PFV) intasome structure (Protein Data Bank (PDB) accession 3OY9). Viral integrase (IN) forms a dimer-of-dimers structure in which the two inner subunits are the active subunits, and the two outer subunits are catalytically inactive. The transferred viral DNA strand is the strand that harbours the terminal d(C-A) dinucleotide and becomes joined to chromosomal DNA by the action of the IN strand transfer activity. Active-site carboxylates are shown as sticks, and divalent metal ions as grey spheres. b | The PFV intasome in complex with a host DNA mimic (PDB accession 3OS2). IN subunits are shown in space-fill mode. c | DNA strand transfer. The model is based on structures of the Mn2+-bound intasome and target capture complex (note that IN binds Mg in vivo; see Ref. 84 for details). The Asp and Glu active-site residues (HIV-1 numbering 64, 116 and 152) of IN are shown as yellow sticks. DNA is shown as magenta and blue sticks, and the invariant viral dA and dC nucleotides are indicated. The direction of the nucleophilic attack is indicated by a red dashed arrow.

The clinically approved HIV-1 IN inhibitor, raltegravir, and similar small molecules that are in development preferentially inhibit DNA strand transfer activity; fortuitously, IN strand transfer inhibitors (INSTIs) harbour broad antiretroviral activity87,88,89. Results based on PFV intasome–INSTI co-crystal structures have been accordingly illuminating. INSTIs harbour two common moieties: co-planar heteroatoms (typically three oxygen atoms) that chelate the active-site metal ions90, and halogenated benzyl groups, the function of which was largely speculative until recently. INSTIs engage the bound metal ions, only slightly influencing their positions within the IN active site. The halogenated benzyl groups of the INSTIs assume the position of the terminal adenine ring, primarily through interactions with the penultimate viral DNA G·C base pair and a 310 helix in IN (Pro145–Gln146 in HIV-1 IN), ejecting the viral 3′-dA (with its associated 3′-hydroxyl nucleophile) from the active site83,88. This displacement of the DNA strand transfer nucleophile forms the mechanistic basis of INSTI action. In addition, INSTIs sterically preclude target DNA binding, explaining the competition between target DNA and these inhibitors82,84. Furthermore, the PFV model has provided important clues about the mechanism of drug resistance associated with HIV-1 IN mutations that are selected in the presence of raltegravir88.

As is the case for RT, there is evidence that a second region of HIV-1 IN, in this case distal from the active site, affords an opportune location for binding of allosteric inhibitors. Lentiviruses such as HIV-1 favour integration within active genes owing to an interaction between IN and the chromatin-binding protein lens epithelium-derived growth factor (LEDGF; also known as transcriptional co-activator p75) (reviewed in Ref. 91). The IN-binding domain of LEDGF is a pseudo HEAT analogous topology (PHAT) domain that consists of two units of a helix–hairpin–helix repeat92. The LEDGF hotspot residues Ile365 and Asp366 at the tip of the N-terminal hairpin nestle into a cleft at the dimer interface of the HIV-1 IN catalytic core domain93. A novel class of HIV-1 IN inhibitors that are capable of suppressing viral replication was recently discovered through a remarkable example of structure-based drug design. These small molecules, termed LEDGINs, mimic the LEDGF–IN interaction and inhibit protein–protein binding94. Given the highly conserved nature of INSTI binding at the active site88,95 and the likelihood of considerable cross-resistance among INSTIs96, the development of such allosteric HIV-1 IN inhibitors is highly desirable.

Viral mRNA biogenesis and transport

Integration marks the transition from the early to late phase of HIV-1 replication, in which the focus shifts to viral gene expression followed by the assembly and egress of nascent viral particles. Transcription, which initiates from the U3 promoter within the upstream LTR (Fig. 1, step 7), requires the viral transactivator protein, Tat, for efficient elongation. Viral mRNAs are produced as a variety of alternatively spliced species. The smaller messages are exported readily from the nucleus, whereas the unspliced and singly spliced mRNAs require the action of Rev. This small viral protein acts as an adaptor, binding to the Rev response element (RRE), located within the env mRNA coding region, and to the host nuclear export factor CRM1 (also known as XPO1) (Fig. 1, step 8). Recent structural biology advances have yielded insight into the mechanisms of Tat transactivation97 and Rev-dependent mRNA export98,99.

Transcription elongation. Tat recruits the cellular protein positive transcription elongation factor b (P-TEFb; comprising cyclin-dependent kinase 9 (CDK9) and cyclin T1) to the viral transactivation response (TAR) element present in viral transcripts100,101. Subsequent CDK9-mediated phosphorylation of the heptad repeat residues Ser2 and Ser5 in the CTD of the large subunit of RNA polymerase II stimulates transcription elongation.

Tat is largely unstructured in the absence of binding ligands102. TAR binding occurs primarily via an α-helical Arg-rich motif (ARM), which inserts into the RNA major groove within the stem–loop structure103. The N-terminal activation domain of Tat, which contains acidic, Pro-rich, zinc-binding motifs and core subdomains, assumes an ordered structure on P-TEFb binding97. Within the complex, Tat primarily interacts with the cyclin T1 subunit, also contacting the T loop region of CDK9 (Fig. 6a). Tat binding stimulates CDK9-mediated phosphorylation of Ser2 and Ser5 of RNA polymerase II104. Accordingly, reciprocal conformational changes in the kinase alter the substrate-binding surface of P-TEFb. Crucially, the fact that Tat induces conformational changes in P-TEFb suggests that it should be possible to develop anti-HIV agents which are directed against P-TEFb but have limited side effects on its normal cellular functions97.

Figure 6: Higher-order Tat and Rev structures.
figure 6

a | Crystal structure of HIV-1 Tat in complex with ATP-bound host positive translation elongation factor b (P-TEFb) (Protein Data Bank (PDB) accession 3MIA). The protein chains are shown as cartoons (left) or in space-fill mode (right). The N lobe, C lobe and T loop of cyclin-dependent kinase 9 (CDK9) are shown. ATP bound to the active site of CDK9 is shown in stick form. Grey spheres are Zn2+ ions. b | Dimeric assemblies of the HIV-1 Rev core observed in crystals (PDB accessions 2X7L and 3LPH). Rev monomers are coloured by chain, with Arg-rich motifs (ARMs) in blue. The crystal structures illustrate two types of Rev–Rev hydrophobic interfaces, one involving Leu12 and Leu60 and the other involving Leu18 and Ile55. c | Model of the Rev hexamer based on the dimeric structures, shown in space-fill mode. The oligomer projects RNA-binding ARM domains (blue) on one side, with CRM1-binding nuclear export signals (not resolved in the current structures) emanating from the other side.

mRNA export. Rev binds to the RRE in a highly cooperative manner, forming an RNA-dependent dimer en route to a higher-order Rev–RNA multimer105,106. The structural basis for Rev multimerization was recently elucidated by two complementary crystallographic studies98,99. Rev adopts an amphipathic helical hairpin, which multimerizes via face-to-face and back-to-back symmetrical interfaces that are stabilized by conserved hydrophobic interactions (Fig. 6b). Collectively, the crystal structures98,99 describe both types of interface and allow modelling of a Rev multimer, which projects pairs of ARMs on one side and C-terminal nuclear export signals for latching onto the cellular nuclear export factor CRM1 on the other (Fig. 6c). The relative orientations of the ARMs in the context of the oligomer are thought to dictate the selectivity of the viral protein for the RRE structure and sequence. The model also accounts for the cooperativity of RNA binding by Rev, although a more complete structure including the RRE will be required to explain the details of protein–RNA recognition.

Viral egress and maturation

The retroviral structural proteins CA, matrix (MA) and nucleocapsid (NC) are synthesized as parts of the precursor polypeptide Gag, and HIV-1 Gag is sufficient for assembly of virus-like particles at the plasma membrane and for budding of these particles from cells107 (Fig. 1, steps 10 and 11). Through an N-terminal myristic acid108,109 and conserved basic amino acid residues110,111,112, MA contributes to the membrane association of Gag. The differential exposure of the myristic acid, through a process known as the myristyl switch113, allows Gag to associate preferentially with the plasma membrane rather than with intracellular membranes. The switch can be activated by phosphatidylinositol-4,5-bisphosphate114, a phospholipid that is concentrated in the inner leaflet of the plasma membrane and interacts directly with MA115. Several steps along the pathway of HIV-1 assembly and particle release from cells have been targeted for antiviral drug development.

Viral late domains and the cellular ESCRT machinery. Retroviral budding is orchestrated by interactions between Pro-rich motifs in Gag, known as late (L) domains, and cellular class E vacuolar protein sorting (VPS) proteins, the actions of which are required to form the nascent particle and sever it from the plasma membrane. The intended function of VPS proteins is in the formation of multivesicular bodies, a process that is topologically identical to viral budding, as in each case a membrane-coated vesicle leaves the cytoplasm; VPS proteins also function in abscission during cell division116,117. Most class E VPS proteins are subunits of ESCRT (endosomal sorting complex required for transport) complexes, which come in four varieties (ESCRT-0, ESCRT-I, ESCRT-II and ESCRT-III). ESCRT-I and ESCRT-II function during membrane budding, whereas ESCRT-III is important for membrane scission. Recent advances have yielded structures of several class E proteins, as well as the class E protein–L domain interactions that are crucial for viral budding from infected cells (see Refs 118, 119 for in-depth reviews).

The C-terminal HIV-1 Gag cleavage product p6 harbours two L domains, P(T/S)AP and LYPX1–3L (Refs 120, 121). The TSG101 component of ESCRT-I engages P(T/S)AP, whereas ALIX (also known as AIP1 and PDCD6IP), itself not formally an ESCRT protein, binds LYPX1–3L (Refs 121, 122). ALIX contains three domains, an N-terminal Bro1 domain, an interior V domain and a C-terminal Pro-rich domain (PRD). Arm 2 of the α-helical V domain interacts with LYPX1–3L of p6, whereas the boomerang-shaped Bro1 domain and PRD interact with different isoforms of the ESCRT-III protein CHMP4 and with TSG101, respectively123,124,125,126,127, accounting for the direct link that ALIX provides between ESCRT-I and ESCRT-III121,128. Highlighting one potential target for the development of inhibitors of HIV-1 budding, the P(T/S)AP domain of p6 inserts into a cleft on the N-terminal UEV domain of TSG101 (Refs 129, 130) (Fig. 7).

Figure 7: Virus–cell interactions and HIV-1 budding.
figure 7

The structure of the UEV domain of TSG101 bound to the P(T/S)AP domain of HIV-1 p6 protein (Protein Data Bank accession 3OBU), in cartoon and space-fill modes. p6 (residues 5–13; PEPTAPPEE) is shown as sticks; the carbon atoms of the core L domain, PTAP, and the flanking regions are orange and yellow, respectively. Some of the key TSG101 residues involved in the interaction are indicated on the right.

Restriction of viral egress. The type II transmembrane protein tetherin (also known as CD317 and BST2) inhibits the release of budding particles by retaining them on the plasma membrane of the virus producer cell131,132 (Fig. 1, step 12). Tetherin consists of a short N-terminal cytoplasmic tail followed by a transmembrane region and an 110-residue ectodomain ending on an amphipathic sequence that reconnects the protein to the plasma membrane133. The hydrophobic C-terminal peptide of tetherin, initially thought to be a signal for glycosyl phosphatidylinositol modification, may in fact function as a second transmembrane domain134. The unusual dual membrane-bound topology of tetherin led to several models to explain virus tethering, involving extended or laterally arranged parallel or antiparallel protein dimers at the cell surface131, and several recent X-ray crystal structures revealed that the ectodomain indeed forms a parallel dimeric α-helical coiled coil135,136,137. In addition, the tetherin dimers can further assemble head to head into tetramers via the formation of a four-helix bundle136,137. However, mutations designed to ablate tetramer formation do not eliminate tetherin function, indicating that tetramerization is not essential for HIV-1 restriction137. These data highlight the extended ectodomain coiled-coil dimer as the likely virus-tethering unit. Ectodomain residues Ala88 and Gly109, which disfavour coiled-coil packing, probably impart some flexibility to the structure, perhaps facilitating insertion of the terminal anchor into the viral membrane136.

HIV-1 Vpu, also a transmembrane protein, counteracts the restriction by tetherin131,132 through a mechanism that depends on a direct interaction between the viral and host proteins138,139. Previously elucidated structures of Vpu fragments yielded limited insight into the mechanism of the Vpu–tetherin interaction, although a recent NMR analysis of lipid membrane-embedded transmembrane peptides indicates a likely antiparallel helix–helix binding interface140.

Protease and virus maturation. The final step of the viral life cycle, which is mediated by PR and occurs concomitant with or soon after budding, converts immature particles to infectious virions via the proteolysis of the precursor peptides Gag and Gag–Pol to yield the structural components MA, CA and NC, and the enzymes PR, RT and IN141 (Fig. 1, step 13). Cryo-electron tomography revealed Gag structural rearrangements that occur within immature particles during proteolysis and maturation142,143, and characterized cellular sites of HIV-1 budding144. Following cleavage of the MA–CA bond, a novel β-hairpin is formed by a salt bridge between the liberated N-terminal Pro1 and Asp51 in CA, triggering core shell assembly145. Recent evidence indicates that the morphological transitions occurring during HIV-1 particle assembly and maturation represent druggable targets. A 12-residue peptide, selected in a phage display screen for binding to the HIV-1 CA CTD, was found to potently restrict CA assembly in vitro146. Bevirimat, a betulinic acid derivative of herbal origin, inhibits HIV-1 replication by specifically blocking PR-mediated cleavage of the CA–SP1 (spacer peptide 1) junction, thus preventing maturation of the viral core147. Exposure to bevirimat leads to stabilization of the immature CA lattice in HIV-1 virions148. CAP1 is another small molecule that has been reported to elicit abnormal HIV-1 core morphologies149. Binding of CAP1 to the CA NTD involves formation of a deep hydrophobic pocket, which serves as a ligand-binding site150. The binding mode of CAP1 is therefore very different from that of PF-3450074, which engages a pre-existing pocket on the CA NTD surface35 (Fig. 3d). It seems likely that the distortion in CA structure that is associated with CAP1 binding interferes with CA hexamer assembly.

Unlike the structures of the viral enzymes discussed above, which were not determined until after initial discoveries of the respective inhibitors, the structure of full-length PR151,152,153 was determined several years before the approval of the first clinical inhibitor targeting the enzyme154. Accordingly, the development of PR inhibitors has benefited more from structure-based design efforts than the development of other antiretroviral drugs, and readers are directed to Refs 155, 156 for historical accounts of the interplay between PR structure and the development of PR inhibitors and resistance mechanisms.

The nine different peptide sequences within Gag and Gag–Pol that are cleaved by PR display limited primary sequence homology. Co-crystallization of six peptide substrates with PR defined a common volume occupied by the substrates (also called the substrate envelope) and indicated that substrate shape rather than primary sequence is a key predictor of functionality157. The approved PR inhibitors are competitive inhibitors that bind to the active site of the enzyme and occupy a volume known as the inhibitor envelope. Overlays of PR–inhibitor co-crystal structures identified regions of the inhibitor envelope which protrude beyond the substrate envelope to contact amino acid residues of PR that do not contact substrate residues and that, when changed, confer drug resistance158. On the basis of these findings, it was postulated that if PR inhibitors were designed to bind precisely within the substrate envelope (and contact only those residues that are essential for PR function), then resistance mutations would be unfavourable, as they would destroy the functional activity (substrate-binding capacity) of PR. In support of this hypothesis, some novel amprenavir-based compounds do indeed display marginally improved binding profiles to drug-resistant PR compared with their binding profiles to wild-type enzyme in vitro159. Because compounds with enhanced binding affinities for wild-type PR bind drug-resistant enzymes less well than amprenavir, additional work is required to determine whether substrate envelope-based PR inhibitors will display beneficial profiles against drug-resistant strains in the clinic.

Conclusions and perspectives

HIV-1 has been analysed by structural biology techniques more than any other virus, with partial or complete structures known for all 15 of its protein components and additional structures determined for substrate- and host factor-bound complexes. Structural biology will continue to have a significant impact on HIV/AIDS research by providing high-resolution glimpses of target protein–drug complexes and virus–host interactions, such as CA–TRIM5α, Vif–APOBEC3G or Vpu–tetherin, and this will reveal novel druggable sites. Despite decades of research, the interactions between HIV-1 and host proteins that underlie some steps in the viral life cycle — for example, import of the pre-integration complex into the nucleus (Fig. 1, step 5) — are only now being illuminated. The simian immunodeficiency virus Vpx protein was recently shown to counteract SAMHD1, the restriction factor that inhibits HIV-1 reverse transcription and infection of monocytic cells160,161, indicating that these protein complexes could also define new paradigms for antiviral drug development.

Further to the ongoing work with PR inhibitors, it will be interesting to see whether structure-based substrate–inhibitor envelope hypotheses will apply to the development of other HIV-1 inhibitors. Because NNRTIs form induced-fit binding pockets, they would appear to be poor candidates for this technique. The tight overlay of multiple bound drugs at the IN active site and the similarities in drug positions with the ejected terminal adenosine base88 hint that INSTIs could be another drug class to benefit from such approaches. Three-dimensional structures of new drug targets as well as inhibitor- or antibody-bound targets will predictably increase the pace of antiviral development and help guide vaccine development efforts162,163. The advent of new technologies and improvements in existing methods will also significantly influence structural virology. Single-particle electron cryo-microscopy has recently yielded near atomic-resolution structures of a number of so-called naked viruses, which, unlike HIV-1, lack an exterior envelope lipid bilayer164. Although the icosahedral symmetry underlying these structures greatly facilitated their determination, ongoing improvements in instrumentation and computational science may well yield similar resolution structures for particles that possess less inherent symmetry.

The development of HAART has dramatically changed the face of the HIV/AIDS epidemic since the disease was first recognized 30 years ago. Considered virtually a death sentence before the advent of antiretroviral drugs, HIV-1 infection is now a manageable chronic disease. Nonetheless, despite these remarkable advances, there remains significant room for improvement. Some of the drugs, in particular the PR inhibitors, exert toxic side effects. More tolerable antiviral regimens could improve patient compliance and consequently reduce the emergence of resistant strains. Although the recently approved INSTI raltegravir is relatively non-toxic, the ease by which it selects for drug-resistant strains highlights the need for second-generation INSTIs with more favourable genetic barriers to resistance. The development of compounds that inhibit functions of less explored drug targets, in particular of the accessory HIV-1 proteins and host factors, would clearly also be of benefit. The availability and efficacy of the current arsenal of antiretroviral drugs should not be taken for granted. It is important to bear in mind that the majority of people infected with HIV do not have access to advanced treatment options. Short of an effective vaccination strategy, the ongoing race against drug resistance can best be won by a sustained effort to develop novel, ever more potent and affordable antiviral treatments.