Main

Developmental pathways of antibodies that neutralize HIV-1 represent potential templates to guide vaccine strategies, if their constituent molecular events were understood and could be reproduced1,2,3. Almost all HIV-1 infected individuals mount a potent antibody response within months of infection, but this response preferentially neutralizes autologous virus, which rapidly escapes4,5. Cross-reactive antibodies capable of neutralizing most HIV-1 strains arise in only 20% of donors after 2–3 years of infection6,7,8,9. An understanding of the development of broadly neutralizing antibody (NAb) lineages in such donors could provide a roadmap for vaccine design.

One means to obtain such a roadmap is through isolation of broadly cross-reactive neutralizing antibodies, characterization of their genetic sequence and molecular properties, and examination of the B cell genetic record with next-generation sequencing (NGS)10,11,12,13,14. The greatest insights can be gained with longitudinal sampling from early after the time of HIV-1 infection15. This allows for a genetic delineation of the molecular evolution leading from an unmutated ancestor antibody, through affinity maturation, to acquisition of neutralization breadth. In principle, such a roadmap should link antibody molecular characteristics to the genetic development that a successful vaccine would retrace.

Neutralizing antibodies to the V1V2 region of the HIV-1 viral spike are among the most prevalent cross-reactive antibodies elicited by natural infection6,16,17,18 and have been isolated from several donors19,20,21. These antibodies have long heavy-chain complementarity-determining region 3 loops (CDR H3s) that are protruding, anionic and often tyrosine sulphated22,23. These CDR H3s penetrate the HIV-1 glycan shield, recognizing a quaternary glycopeptide epitope at the apex of the HIV-1 spike that is formed by V1V2s from at least two gp120 protomers22,23,24. Here we use antibody isolation, B-cell next-generation sequencing, structural characterization, and viral single-genome amplification (SGA) to delineate longitudinal interactions between the developing antibody and autologous virus within donor CAP256, who showed evidence of V1V2-mediated neutralization breadth after one year18,25,26. Our results define the molecular requirements and genetic pathways that lead to V1V2-directed neutralization, providing a template for their vaccine elicitation.

Antibody isolation and characterization

Donor CAP256 peripheral blood mononuclear cells (PBMCs) sampled 59, 119 and 206 weeks post-infection were used to isolate 12 monoclonal antibodies by high-throughput B-cell culture, functional screening by microneutralization, and PCR with reverse transcription (RT–PCR) of antibody variable regions27,28 (Fig. 1a). All 12 were somatically related and distinguished by long CDR H3s of 35–37 amino acids (Kabat29 numbering) (Fig. 1b and Extended Data Fig. 1a). The heavy and light chains exhibited somatic mutation of 4–15% from their germline-encoded V-genes, VH3-30 and Vλ1-51, respectively (Extended Data Fig 1 and Extended Data Table 1). When these antibodies were reconstituted as IgG1s, they showed varying degrees of heterologous virus neutralization and were extremely potent against many subtype A and C strains (Fig. 1b, c, Extended Data Fig. 2 and Supplementary Fig. 1). The combination of all 12 antibodies recapitulated plasma neutralization (Supplementary Fig. 2), indicating the CAP256-VRC26 antibody lineage to be responsible for the neutralization breadth and potency of donor CAP256.

Figure 1: Development of broad neutralization by donor CAP256 and isolation of neutralizing antibodies.
figure 1

a, Timing of antibody isolation in relation to plasma neutralization titres against the primary infecting virus (PI), the superinfecting virus (SU), and a panel of 40 heterologous viruses (geometric mean titre shown). Percentage breadth (grey area), percentage of viruses neutralized with plasma median inhibitory dilution (ID50) > 45. b, Genetic characteristics and neutralization breadth and potency of the 12 isolated antibodies. Week of antibody isolation and V-gene mutation rates are indicated. Residues flanking the Kabat-defined CDR H3 sequences are shown in light grey. Neutralization was assessed against a panel of 47 heterologous viruses. c, Breadth and potency of antibody CAP256-VRC26.08 on a panel of 194 Env-pseudoviruses. Dendrogram shows phylogenetic relatedness of Env sequences in the panel.

PowerPoint slide

To map the epitope of the CAP256-VRC26 antibodies, we used neutralization fingerprints18; binding assays for HIV-1 Envelope (Env) in soluble, cell surface30, and viral particle31 contexts; and negative stain electron microscopy (EM) of Fab CAP256-VRC26.09 bound to a soluble cleaved version of the HIV-1 trimer24,32,33 (Fig. 2a–c, Supplementary Fig. 3 and Extended Data Figs 3 and 4). Recognition of Env by CAP256-VRC26 antibodies was similar to PG9-class neutralizing antibodies that recognize the trimeric V1V2 cap24, with high specificity for the Env native quaternary conformation and one Fab bound per trimer (Fig. 2c, left and Extended Data Fig. 4). Neutralization activity of CAP256-VRC26 antibodies was reduced or knocked out by Env mutations in V1V2 strands B and C (Fig. 2d), much like the CAP256 plasma25,26 and PG9-class neutralizing antibodies22,23,34; although unlike PG9, the CAP256-VRC26 antibodies were only partially and variably sensitive to loss of glycans at N160 and N156 (Fig. 2d and Extended Data Fig. 5). Overall, these data indicated the epitope to be at the membrane-distal apex of the HIV-1 spike close to the trimer axis (Fig. 2e), providing a structural explanation for the observed quaternary specificity.

Figure 2: Mapping of CAP256-VRC26 epitope on the HIV-1 Env spike.
figure 2

a, Correlations between neutralization fingerprints (see Methods) of CAP256-VRC26 antibodies and CAP256 plasma (left). Darker grey indicates stronger correlation. Correlations between neutralization fingerprints of CAP256-VRC26 antibodies and representative antibodies targeting the major HIV-1 neutralization epitopes (right). Correlations are colour-coded by antibody; darker shades indicate stronger correlations. b, Competition assay. Binding to ZM53-Env-expressing 293T cells by labelled CAP256-VRC26.08 and unlabelled competitor antibodies measured by flow cytometry. Assay shown is representative of three experiments. c, Negative stain electron microscopy (EM) 3D reconstruction of CAP256-VRC26.09 Fab in complex with soluble cleaved BG505 SOSIP.664 trimer (left); 2D-class averages of VRC26.09 and PG9 in complex with BG505 SOSIP.664 trimer (right). d, Neutralization of Env-pseudoviruses with HIV-ConC and V2 point mutants. Sequence shows amino acids 160–175. e, Location of HIV-1 epitopes. EM density of viral spike50, with viral membrane at top and major sites of vulnerability shown as determined by structural mapping of antibody interactions24 (left). The gp41 membrane proximal external region (MPER) is shown schematically. Model of V1V2 based on EM structure of BG505 SOSIP.664 trimer24,32, viewed looking towards the viral membrane along the trimer axis (right). Green ribbon, strand C. V2 mutations from panel d are shown with surface representation; brighter green indicates more potent effects on neutralization.

PowerPoint slide

Origin and development of the lineage

To obtain a genetic record of the CAP256-VRC26 antibody lineage, we analysed B cell-immunoglobulin transcripts at eight time points between 15 and 206 weeks post-infection by 454 pyrosequencing. Although no CAP256-VRC26 lineage-related transcripts were detected at 15 and 30 weeks, related heavy chain and light chain transcripts were found at all later time points (Fig. 3a). To track longitudinal prevalence, we used identity-divergence plots12 of all heavy chain reads assigned to the same VH3-30 germline gene as the isolated antibodies. Using CAP256-VRC26.01 or CAP256-VRC26.08 as the identity referents, segregated islands of related heavy chain sequences first appeared at week 38 (Fig. 3b). For all 12 antibodies, the prevalence and identity of related sequences peaked close to the time of the antibody isolation (Supplementary Fig. 4). To obtain additional antibody lineage data, we performed linked Vh:Vl paired sequencing35 at five time points (Fig. 3a and Supplementary Table 1). Of 157 unique CAP256-VRC26 pairs, 7 matched either heavy or light chain sequences present in the 454 pyrosequencing data, including 2 for which both heavy and light chain sequences had previously been captured (Fig. 3c).

Figure 3: Maturation of the CAP256-VRC26 lineage revealed by NGS and Vh:Vl paired sequencing of B cell transcripts.
figure 3

a, Timeline of longitudinal peripheral blood samples with quantification of all NGS sequence reads (total and unique), and CAP256-VRC26 lineage-related reads (total and unique). Arrows below the line indicate time points of 454 pyrosequencing for heavy and light chain sequences. Circles indicate time points of paired sequencing of sorted B cells (see Methods). PCR amplifications for pyrosequencing used primers specific for VH3 family sequences (heavy chain) and V lambda sequences (light chain), with the exception of the week 176 sample (asterisk), which was amplified using all-VH gene primers, resulting in fewer CAP256-VRC26 specific reads. b, Maturation time course for CAP256-VRC26.01 (top) and CAP256-VRC26.08 (bottom panels). Heat map plots show sequence identity (vertical axis) versus germline divergence (horizontal axis) for NGS data. The 12 isolated antibodies are displayed as red ‘x’ marks for reference, with the exception of the CAP256-VRC26.01 and 08 antibodies which are shown as black dots. Numbers between the top and bottom panels correspond to the number of raw reads with at least 85% identity to the indicated antibody (VRC26.01 (top), VRC26.08 (bottom)). c, Phylogenetic trees of the CAP256-VRC26 clonal lineage for heavy chain (left) and light chain (right) were constructed by maximum likelihood using the 454 sequences and the isolated antibodies (black dots, labelled with antibody name). Branches are coloured by time point when NGS sequences were first detected. The orange and blue circles indicate linked heavy and light chain sequences from the paired sequencing data. Scale, rate of nucleotide change (per site) between nodes.

PowerPoint slide

Maximum-likelihood phylogenetic trees were constructed using the isolated antibodies and the 454 data (Fig. 3c). The lineage bifurcates early, with one branch leading to CAP256-VRC26.01 and a second developing into CAP256-VRC26.02–12. The unmutated common ancestors (UCAs) for the heavy and light chain were inferred from the phylogenetic trees (Fig. 3c). For the light chain, the UCA had a 12-residue CDR L3, as in CAP256-VRC26.01, and for the heavy chain, the inferred UCA had a 35-residue CDR H3 (Extended Data Fig. 6), probably the result of VDJ recombination with a single D-gene, IgHD3-3*01 and non-templated (N)-nucleotide insertions of 34 and 31 nucleotides at each junction (Supplementary Fig. 5). This inferred UCA was further supported by very-low-divergence sequences among the lineage members identified from the week 38 heavy chain data. Five unique sequences were found, all of which had CDR H3s matching the inferred UCA in at least 30 of 35 amino acids while containing three or fewer nucleotide changes in VH and JH combined (Extended Data Fig. 6). Thus, the longitudinal NGS analysis established the first appearance of the CAP256-VRC26 lineage; defined the UCA, the product of gene recombination in the ancestor B cell of the lineage; and provided a genetic record of the development of this lineage over four years.

Structures of CAP256-VRC26 antibodies

To define the structural characteristics of CAP256-VRC26 lineage development, we determined crystal structures for Fabs of the UCA and six antibodies from weeks 59, 119 and 206 (Fig. 4, Supplementary Table 2, and Supplementary Fig. 6a). The mature CDR H3s protruded 20Å above the antigen-combining surface of the heavy chain and contained a 2-stranded β-sheet, O-sulphated tyrosines, and an intra-CDR H3 disulphide bond (Fig. 4a, b). The CDR H3s of the UCA and CAP256-VRC26.01 lacked a CDR H3 disulphide bond, exhibited greater disorder and were positioned more proximal to the light chain (Fig. 4c); the appearance of the disulphide bond correlated with adoption of the mature CDR H3 orientation (Fig. 4c, Supplementary Fig. 6b, and Extended Data Fig. 7a). Mutation to remove the relevant cysteine residues in VRC26.03 resulted in loss of neutralization potency and breadth (Extended Data Fig. 7b, c). Additionally, the appearance of CDR H3 cysteines coincided with a glycine to arginine mutation at the base of the CDR H3, possibly limiting flexibility of the mature antibodies (Extended Data Fig. 7a, b and Supplementary Fig. 7). Overall, the CAP256-VRC26 lineage begins with an anionic protruding CDR H3 with structural properties similar to previously determined V1V2-directed broadly neutralizing antibodies. Development over four years involves the introduction of almost 20 light chain and over 30 heavy chain mutations, including a disulphide bond. The CDR H3 changes its overall orientation while losing negative charge and maintaining tyrosine sulphation (Fig. 4b, c, right).

Figure 4: Structural characteristics of the developing CAP256-VRC26 lineage.
figure 4

a, Crystal structure of the antigen-binding fragment (Fab) of CAP256-VRC26.03 shown in ribbon diagram representation. b, The intra-loop disulphide bond and tyrosine sulphation are shown in stick representation, and enlarged to show electron density (blue mesh, 2Fo-Fc at 1σ) (left). Molecular surface, with electrostatic potentials coloured red for acidic and blue for basic (right). CDR H3 regions of broadly neutralizing V1V2-directed antibodies are shown for comparison, with the left image in ribbon representation (tyrosine sulphates highlighted in red) and the right image in electrostatic representation. c, A condensed heavy chain phylogenetic tree highlights the isolated antibodies (left). Scale, rate of nucleotide change between nodes. The number of amino acid (aa) mutations to the heavy chain (H) and light chain (L) relative to the UCA are shown. Structures of the variable regions (middle). Mutations from the UCA are represented as spheres coloured according to the week of antibody isolation at which the mutations first appear. CDR H3 details (right). Residues that are (or evolve to become) cysteines are labelled (grey dotted lines indicate modelled disordered regions). The position of tyrosines predicted to be sulphated (scores >1) are noted and were included in the formal charges shown for each CDR H3 and the electrostatic representations (far right). Asterisk denotes Tyr insertion in VRC26.06.

PowerPoint slide

HIV Env evolution during NAb development

To gain insight into the temporal HIV-1 Env changes driving the development of the CAP256-VRC26 lineage, we used SGA to determine viral sequences over 3 years. CAP256 Env sequences showed high levels of diversity driven, in part, by recombination between the superinfecting virus (SU) that was first detected 15 weeks post-infection and the primary infecting virus (PI)26 (Fig. 5a, Supplementary Figs 8, 9). Differences between the primary infecting virus and superinfecting virus Env sequences included V2 residues 165 and 169, and an N160 glycan in the superinfecting virus that was not present in the primary infecting virus (Fig. 5b and Extended Data Fig. 8a, b). Notably, compared to the primary infecting virus, the superinfecting virus contained V2 residues that are more commonly found among circulating viruses (Extended Data Fig. 8a). All 12 antibodies neutralized the superinfecting virus, and, with the exception of CAP256-VRC26.06, failed to neutralize the primary infecting virus, suggesting the superinfecting virus V1V2 initially engaged the naive B cell of the CAP256-VRC26 lineage (Fig. 5d, Extended Data Fig. 8c and Supplementary Fig. 10).

Figure 5: HIV-1 Env evolution and the development of the CAP256-VRC26 lineage.
figure 5

a, V1V2 sequences are shown in highlighter format with the primary infecting virus (PI) designated as master and V2 residues 160 to 171 boxed. Asterisk at week 15 denotes sequences amplified with strain-specific primers matching the superinfecting virus (SU) virus. b, Logogram of the V2 epitope for all CAP256 sequences, with mutations away from the PI (master sequence) in colour. c, SU-like V1V2 sequences are indicated by black (present) and grey (absent) boxes. Escape mutations (K169E or R166S/K) are indicated by brown boxes. The net charge of the V2 epitope (residues 160 to 171) is shown in purple/white, ranging from +3 to 0. White lines separate clones within a time point; black lines separate time points. d, Neutralization by the 12 CAP256-VRC26 monoclonal antibodies of representative longitudinal Env clones isolated between 6 and 176 weeks post-infection (weeks shown at far right). The CAP256 monoclonal antibodies are coloured by time of isolation (as in Fig. 1). The development of the CAP256-VRC26 antibody lineage, V1V2-directed plasma neutralizing antibodies and plasma heterologous neutralization, are indicated on the right.

PowerPoint slide

Before the CAP256-VRC26 antibodies developed, most Env sequences had V1V2 regions derived from the primary infecting virus (Fig. 5a–c and Supplementary Figs 8 and 9) and were therefore largely neutralization resistant (Fig. 5d and Supplementary Fig. 10). Among superinfecting-virus-like sequences, a rare K169I mutation arose under strong directional selection (Supplementary Table 3) as the CAP256-VRC26 lineage emerged, which rendered the superinfecting virus resistant to only the earliest antibody (Extended Data Fig. 8d, e), indicating that CAP256-VRC26.01-like antibodies drove this viral escape, followed by maturation of the lineage to tolerate I169. At 48 weeks, the viral population underwent a substantial shift (Fig. 5a and Supplementary Figs 8 and 9), with the superinfecting-virus-like V1V2 dominating just before the development of neutralization breadth. Neutralization of Env clones by later antibodies (CAP256-VRC26.02-12) tracked with the presence of superinfecting-virus-like V1V2 sequences (black bar, Fig. 5c) until escape occurred through mutations at positions 166 or 169 (Fig. 5c, d and Extended Data Fig. 8d). These mutations resulted in a net charge change in the V2 epitope (+3 to 0, Fig. 5c, Extended Data Fig. 8b) concomitant with the antibody CDR H3s becoming less acidic over time (−10 to −4, Fig. 4 and Extended Data Fig. 9) suggesting co-evolution of the viral epitope and the antibody paratope. Overall, these results highlight the interplay between virus and antibody, with the superinfecting-virus-like V1V2 epitope stimulating expansion of the CAP256-VRC26 lineage.

Rapid development of CAP256-VRC26.01

To gain insight into the development of V1V2-directed neutralization, we focused on the early antibody CAP256-VRC26.01, isolated at week 59, which neutralized 30% of clade C viruses and showed cross-clade neutralization of nearly 20% (Supplementary Fig. 1). Notably, this week 59 time point was 44 weeks after superinfection and only 21 weeks after the CAP256-VRC26 lineage was first detected by NGS. We also inferred heavy and light chains for two developmental intermediates (VRC26-I1 and VRC26-I2) (Fig. 6a and Extended Data Fig. 1) and characterized their function along with the UCA (Fig. 6b–e). The UCA bound and neutralized the superinfecting virus weakly, but did not bind or neutralize heterologous viruses. VRC26-I1, VRC26-I2 and CAP256-VRC26.01 demonstrated progressively greater binding and neutralization, with VRC26-I1 neutralizing 2 of 7 strains and VRC26-I2 neutralizing 6 of 7 strains (Fig. 6e), with dependence on residues in V2 (Fig. 6c). Interestingly, the primary infecting virus was neither bound nor neutralized by the UCA, intermediates, or CAP256-VRC26.01 (Fig. 6c and Supplementary Fig. 11). These data provide further evidence that the CAP256-VRC26 lineage was initiated by interaction with a superinfecting-virus-like V1V2. Subsequent affinity maturation, focused within CDR H3 (Fig. 6f and Extended Data Table 1), allowed for progressively greater binding and neutralization with increased viral diversity preceding the emergence of neutralization breadth. On the basis of the inferred UCA, CAP256-VRC26.01 diverged 11% from germline heavy chain and 7% from germline light chain (Fig. 6f). Thus, once an appropriate gene recombination allows for B-cell receptor recognition of the trimeric V1V2 epitope, development of cross-reactive neutralization can be achieved with moderate somatic mutation in a matter of months.

Figure 6: Development from UCA to CAP256-VRC26.01.
figure 6

a, Expanded view of the phylogenetic trees from Fig. 3c, highlighting the maturation pathway of CAP256-VRC26.01. Off-pathway branches were collapsed and are shown as dashed lines. Inferred intermediates VRC26-I1 and VRC26-I2 were expressed for functional analyses. be, Binding and neutralization of antibodies UCA, VRC26-I1, VRC26-I2, VRC26.01. b, d, Binding to cell-surface expressed Env (SU and ZM53). MFI, median fluorescence intensity. c, e, Neutralization of PI, SU and point mutants (c) and seven heterologous viruses (e). Bars, standard error of the mean (triplicates). f, Structural models of VRC26.01 lineage antibodies. Affinity matured residues are shown as spheres coloured according to the intermediate at which they first appear: red, VRC26-I1; orange, VRC26-I2; green, VRC26.01. Grey dots, disordered residues in the CDR H3. The number of changes from the UCA to each intermediate are noted for V gene only (Vh or Vl), or from the full UCA (UCA-HC or UCA-LC).

PowerPoint slide

Vaccine implications

The V1V2 region of HIV-1 is a common target of serum neutralizing antibodies6,16,17,18. In the RV144 Thai vaccine trial, an increased level of binding antibodies to the V1V2 region was associated with a reduced risk of infection36 and viral sieve analysis showed immune pressure in the same region37. Although the vaccine in the RV144 trial did not elicit broadly neutralizing V1V2-directed antibodies similar to those described here and elsewhere19,20,21, a more effective vaccine would ideally elicit cross-reactive neutralizing antibodies1,2,3,38. Previously described V1V2 neutralizing antibodies, and the CAP256-VRC26 lineage, all have long CDR H3 regions that are necessary to penetrate the glycan shield and engage a V1V2 epitope (Extended Data Table 1). An important unanswered question has been whether these long CDR H3s are fully formed by VDJ recombination, as has been seen in HIV-uninfected donors39, or emerge by insertions during the process of affinity maturation. We show here that the 35-residue CDR H3 of the CAP256-VRC26 UCA was produced during initial gene rearrangement and therefore existed at the level of the naive B cell receptor.

A potential rate-limiting developmental step in the CAP256-VRC26 lineage is the gene rearrangement that generated its UCA. By one estimate, human B cells with recombined antibody genes encoding long (≥24 amino acids, international immunogenetics database (IMGT)40 definition) or very long (≥28 amino acids) CDR H3s constitute 3.5% and 0.4%, respectively, of naive B cells39. These long B cell receptors have been associated with autoreactivity, and are subject to both central and peripheral deletion, resulting in an even smaller population of IgG+ memory B cells39,41. We therefore tested the UCA and all 12 CAP256-VRC26 cloned antibodies for autoreactivity42. The UCA and mature CAP256-VRC26 antibodies demonstrated little or no reactivity with Hep2 cells or with cardiolipin (Extended Data Fig. 6b, c). In addition, NGS of CAP256 peripheral B cells indicated that <0.4% of sequences had CDR H3s of ≥ 28 amino acids (Extended Data Fig. 6d) suggesting that this donor did not have an unusually high frequency of clonal lineages with long CDR H3 regions.

We also inferred the virological events leading to the stimulation and evolution of the CAP256-VRC26 lineage by the superinfecting virus. Similar to the CH103 CD4-binding site lineage in donor CH505 (ref. 15), the autologous virus in CAP256 showed extensive diversification before the development of breadth. Subsequent antibody–virus interactions appeared to drive somatic mutation and development of cross-reactive neutralization. Finally, the ontogeny of V1V2-directed neutralizing antibodies revealed by the CAP256-VRC26 lineage indicates that neutralization potency and breadth can be achieved without extraordinary levels of somatic hypermutation. Although some neutralizing antibodies appear to require years of maturation1,3,43,44, we show that a V1V2-directed B cell lineage can acquire HIV-1 neutralization breadth within months rather than years. The critical event appears to be an uncommon gene rearrangement that produces a B-cell receptor with a protruding, tyrosine-sulphated, anionic CDR H3. Identifying features of antigens able to engage naive B cells with such CDR H3s is a critical step in design of vaccines targeting V1V2. Such antigens could be screened for binding to the UCA versions of neutralizing antibodies as an indicator of the ability to engage an appropriate naive B cell receptor. This work also suggests that although an appropriate trimeric V1V2 construct may elicit neutralizing V1V2 antibodies, sequential immunogens that mirror viral evolution may be needed to drive the development of breadth. Overall, the precise delineation of the developmental pathway for the CAP256-VRC26 lineage should provide a basis for attempts to elicit broad V1V2-directed HIV-1-neutralizing antibodies.

Methods Summary

Serial blood samples were collected from HIV-1-infected subject CAP256 from 6 to 225 weeks after infection. Monoclonal antibodies CAP256-VRC26.01-12 were generated by single-B cell culture, microneutralization screening, RT–PCR, subcloning, and expression as described in27,28,45. CDR lengths used Kabat notation29 except as indicated. Binding of CAP256-VRC26 antibodies to virus-like particles was assessed by ELISA31 and binding to cell-surface expressed Env was measured by flow cytometry30. HIV-1 neutralizing activity of patient plasma and monoclonal antibodies was determined with Env-pseudoviruses using the TZM-bl cell line46,47. Neutralization fingerprints are the rank-order of neutralization potencies for an antibody against a set of diverse viral strains, calculated as in ref. 18. A 28Å reconstruction of the BG505 SOSIP.664 gp140 trimer with a single VRC26.09 Fab was obtained by negative stain EM using Appion, Xmipp, IMAGIC, and EMAN software. 454 pyrosequencing was performed as previously described12,14 on samples from 8 time points after HIV-1 infection. High-throughput Vh:Vl pairing of peripheral blood CD27+ B cells was performed in single cell emulsions generated using a flow focusing apparatus35. Phylogenetic analysis, inference of UCA, and identification, synthesis, and expression of clone members were performed as described in the Methods. Epitope mapping onto the spike trimer was performed with the software package UCSF Chimera, using experimental data as described in Methods. Crystallographic analysis of Fab fragments was performed as described in the Methods. Structure modelling of disordered residues in Fab crystal structures was performed using Loopy software. Single-genome amplification and expression of env genes was performed as described in Methods and in refs 48, 49.

Online Methods

Study subject

CAPRISA participant CAP256 was enrolled into the CAPRISA acute infection study51 that was established in 2004 in KwaZulu-Natal, South Africa for follow-up and subsequent identification of HIV seroconversion. CAP256 was one of the 7 women in this cohort who developed neutralization breadth6. The CAPRISA 002 acute infection study was reviewed and approved by the research ethics committees of the University of KwaZulu-Natal (E013/04), the University of Cape Town (025/2004), and the University of the Witwatersrand (MM040202). CAP256 provided written informed consent for study participation. Samples were drawn between 2005–09.

Isolation and expression of CAP256-VRC26 family genes

PBMC isolated from CAP256 blood draws at weeks 59, 119 and 206 were stained and sorted for IgG+ B cells on a FACS Aria II as described in ref. 18. Cells were plated at two B cells per well in 384-well plates and cultured for 14 days in the presence of IL-2, IL-21, and CD40L-expressing irradiated feeder cells, as described in refs 27 and 45. Culture supernatants were screened by microneutralization as described in ref. 52 against HIV-1 ZM53.12 and either CAP45.G3 or CAP210.E8 Env-pseudoviruses. Kappa and lambda light chain gene and IgG heavy chain gene variable regions were amplified from neutralization-positive wells, subcloned, expressed and purified as described in ref. 18. Heavy chains were reconstituted as IgG1. The efficiency of cloning was as follows. For week 59, a total of 15,000 B cells (7,500 wells) were plated, 8.3% of wells produced IgG, 4 were positive in microneutralization, and one heavy-light chain pair was recovered. For week 119, a total of 45,000 B cells were plated, 48% of wells produced IgG, 49 wells were positive in microneutralization, and 8 heavy-light chain pairs were recovered. For week 206, a total of 42,000 B cells were plated, 29% of wells produced IgG, 34 wells were positive in microneutralization and 3 heavy-light chain pairs were recovered.

The antibodies are numbered CAP256-VRC26.01-.12 in order of the time point of the sample from which they were isolated, and then the degree of heavy-chain somatic mutation.

Neutralization assays

Single round of replication Env-pseudoviruses were prepared, titred and used to infect TZM-bl target cells as described previously46,47. Neutralization breadth of CAP256-VRC26.01, .03, .06, and .08 were determined using a previously described18,53 panel of 194 geographically and genetically diverse Env-pseudoviruses representing the major subtypes and circulating recombinant forms. The remaining antibodies were assayed on a subset of this panel. The data were calculated as a reduction in luminescence units compared with control wells, and reported as half-maximum inhibitory concentration (IC50) in micrograms per microlitre for monoclonal antibodies, or reciprocal dilution (ID50) for plasma samples.

Neutralization fingerprints

Owing to the high sequence variability of HIV-1 Env, different viral strains may exhibit different neutralization sensitivities to the same antibody, and this pattern of neutralization variation can be used to define the neutralization fingerprint for a given antibody. Namely, the neutralization fingerprint of an antibody is defined as the rank-order of neutralization potencies for the antibody against a set of diverse viral strains18.

The correlations between the neutralization fingerprints of the CAP256-VRC26 antibodies and the neutralization patterns of four longitudinal serum time points (at 59, 106, 159, and 220 weeks post-infection) were computed over a set of 29 HIV-1 strains (6535.3, AC10.29, CAAN.A2, CAP210.E8, CAP244.D3, CAP45.G3, DU156.12, DU172.17, DU422.01, PVO.04, Q168.a2, Q23.17, Q259.d2.17, Q461.e2, Q769.d22, Q842.d12, QH0692.42, REJO.67, RHPA.7, SC422.8, THRO.18, TRJO.58, TRO.11, WITO.33, ZM109.4, ZM135.10a, ZM197.7, ZM233.6, ZM53.12)18. The correlations between the neutralization potencies of the CAP256-VRC26 antibodies and a reference set of antibodies targeting the four major sites of vulnerability, with at most two antibodies per unique donor, were computed over a set of 41 HIV-1 strains (6535.3, 0260.v5.c36, 6405.v4.c34, AC10.29, C1080.c3, CAAN.A2, CAP210.E8, CAP244.D3, CAP45.G3, CNE3, DU156.12, DU172.17, DU422.01, KER2008.12, KER2018.11, MB201.A1, MB539.2B7, PVO.04, Q168.a2, Q23.17, Q259.17, Q461.e2, Q769.d22, Q842.d12, QH0692.42, REJO.67, RHPA.7, RW020.2, SC422.8, TH976.17, THRO.18, TRJO.58, TRO.11, UG037.8, WITO.33, ZM109.4, ZM135.10a, ZM197.7, ZM214.15, ZM249.1, ZM53.12). The correlations between the neutralization patterns of the four longitudinal serum time points and the neutralization fingerprints of the reference antibodies were computed over a set of 28 HIV-1 strains (6535.3, AC10.29, CAAN.A2, CAP210.E8, CAP244.D3, CAP45.G3, DU156.12, DU172.17, DU422.01, PVO.04, Q168.a2, Q23.17, Q259.17, Q461.e2, Q769.d22, Q842.d12, QH0692.42, REJO.67, RHPA.7, SC422.8, THRO.18, TRJO.58, TRO.11, WITO.33, ZM109.4, ZM135.10a, ZM197.7, ZM53.12). For the reference antibodies, data from multiple neutralization experiments were averaged and consolidated. All correlations are based on the Spearman’s rank correlation coefficient.

Virus-like particle ELISA

VLP ELISAs were performed as described previously31. Briefly, VLPs were produced by PEI-based cotransfection of 293T cells with a pCAGGS-based, Env-expressing plasmid and the Env-deficient HIV-1 genomic backbone plasmid pNL-LucR-E. VLPs were coated on ELISA wells at 20× the concentration in transfection supernatants. Monoclonal antibody binding was then assessed by ELISA, omitting detergent in PBS wash buffers and probing with an anti-human Fc alkaline phosphatase conjugate (Accurate, Westbury, NY) and SigmaFAST p-nitrophenyl phosphate tablets (Sigma). Plates were read at 405 nm.

Cell-surface Env binding

293T cells were transiently transfected with plasmids encoding Env ZM53.12 or CAP256-SU with deletions of the cytoplasmic tail30. For binding experiments: after 2 days, the cells were stained with ViVid viability dye (Invitrogen) followed by serial dilutions of antibodies, two washes with PBS/5% FBS, then R-PE-conjugated F(ab) goat anti-human IgG specific for the Fc fragment (Jackson ImmunoResearch) at a 1:200 dilution54. For competition assays, the cells were stained with ViVid viability dye followed by biotinylated CAP256-VRC26.08 (0.8 μg ml−1) premixed with serially diluted unlabelled competitor antibodies. After incubation and 2 washes, cells were stained with streptavidin-PE (Invitrogen) at 1:200 dilution. Cells were analysed on a BD LSRII (Becton Dickinson). Binding was measured as the median fluorescence intensity (MFI) for each sample minus the MFI of cells stained with secondary antibody only.

Polyreactivity analysis of antibodies

Antibody binding to cardiolipin was determined as in ref. 42. Briefly, using the QUANTA Lite ACA IgG III ELISA kit (Zeus Scientific) per manufacturer’s protocol, each antibody was diluted to 100 μg ml−1 in the kit sample diluent and tested in threefold serial dilutions. Results shown are representative of at least two independent ELISAs. Positive and negative controls were included on each plate, and values three times above background were considered positive. Antibody reactivity to a human epithelial cell line (HEp-2) was determined with the ANA/HEp-2 Cell Culture IFA Test System (Zeus Scientific) per manufacturer’s protocol, as described in ref. 42. Antibodies were diluted to 50 μg ml−1 and 25 μg ml−1 in ZOBRA-NS diluent. Positive and negative controls were included on each slide. Antibodies were scored negative, indeterminate, or positive (1+ to 4+) at each dilution. Results are representative of at least two independent experiments.

Electron microscopy (EM) and image processing

VRC26.09 Fabs in complex with BG505 SOSIP.664 gp140 trimer produced in HEK 293S cells were analysed by negative stain EM. A 3 μl aliquot of 8 μg ml−1 of the complex was applied for 15 s onto a glow discharged, carbon-coated 400 Cu mesh grid and stained with 2% uranyl formate for 20 s. Grids were imaged using a FEI Tecnai T12 electron microscope operating at 120 kV using a 52,000× magnification and electron dose of 25 e− /Å2, which resulted in a pixel size of 2.05Å at the specimen plane. Images were acquired with a Tietz 4k × 4k CCD camera in 5° tilt increments from 0° to 50° at a defocus of 1,000 nm using LEGINON55.

Particles were picked automatically by using DoG Picker and put into a particle stack using the Appion software package56,57. Initial reference free 2D class averages were calculated using particles binned by 2 via the Xmipp Clustering 2D Alignment and sorted into 128 classes58. Particles corresponding to the complexes were selected into a substack and another round of reference free alignment was carried out with unbinned particles using Xmipp Clustering 2D alignment and IMAGIC softwares59. To generate an ab initio 3D starting model, a template stack of 44 images of 2D class averages was used without imposing symmetry. The resulting starting model was refined against 2D class averages for 9 cycles and subsequently with 6,763 raw particles for 9 cycles using EMAN60. The resolution of the final reconstruction was calculated to be 28Å using an FSC cut-off value of 0.5.

High-throughput sequencing

Amplicon for 454 next-generation sequencing was prepared as described12,14 with slight modifications as indicated. Briefly, mRNA was prepared from 10–15 million PBMC using an Oligotex kit (Qiagen). cDNA was synthesized using Superscript II reverse transcriptase (Invitrogen) and oligo-dT(12–18) primers. Individual PCR reactions were performed with Phusion polymerase for 30 cycles. Primers (Supplementary Table 4) consisted of pools of 5–7 oligonucleotides specific for all lambda gene families or VH3 family genes, and had adapters for 454 next generation sequencing. For week 176 only, heavy-chain PCR was performed with primers for all VH families, and mixed lambda and kappa primers were used for light chain (Supplementary Table 4). PCR products were gel-purified (Qiagen). Pyrosequencing of the PCR products was performed on a GSFLX sequencing instrument (Roche-454 Life Sciences, Bradford, CT, USA) on a half chip per reaction (full chips for week 176). On average, 250,000 raw reads were produced.

High-throughput linkage of Vh and Vl transcripts was performed in single cell emulsions generated using a flow focusing apparatus35 (B.J.D., manuscript in preparation). CD27+ B cells were isolated from CAP256 PBMCs collected at 34, 48, 59, 69, and 119 weeks post-infection by magnetic bead sorting (Miltenyi Biotec, Auburn, CA). Cells from weeks 34 and 119 were divided in two groups and half of the cells were analysed with FR1 primers35, while the other half were analysed with leader peptide primers41 (Supplementary Table 5). All other time points were analysed in a single group using only FR1 primers (Supplementary Table 1). Overlap extension RT–PCR was performed as previously reported35, with extension time increased to 125 s. Nested PCR was performed as described previously with a 23-s extension time and PCR products were sequenced using the Illumina 2 × 250 bp MiSeq platform. Raw reads were quality-filtered for an Illumina Q-score of 20 in 50% of bases. VRC26-class Vh and paired Vl sequences were identified via BLAST against CDR-H3 nucleotide sequences of the 12 culture-isolated antibodies.

Antibodyomics pipeline

Raw 454 data was processed using a pipeline implemented in Python, similar to one we reported previously14. Briefly, reads were filtered for length, keeping only those between 300 and 600 nucleotides. Germline V genes were then assigned to each read using BLAST with empirically optimized parameters. Reads for which no V gene match was found with an e-value ≤ 10−10 were discarded. For reads assigned to any VH3-30 or Vλ1-51 allele, (the CAP256-VRC26 germline genes), ClustalW2 (ref. 61) was used to calculate the sequence identity to the germline and each isolated antibody. These data were plotted as density heat maps using ggplot2 in R to produce identity-divergence plots (Fig. 3b and Supplementary Fig. 4).

Finding clonally related sequences

Reads that were assigned to the same V genes as CAP256-VRC26, VH3-30 and Vλ1-51, were submitted to IMGT High-Vquest62 (http://www.imgt.org/IMGTindex/IMGTHighV-QUEST.html), and the results, including automated sequence corrections, were used to further sieve for lineage-related sequences. Reads assigned to J genes matching CAP256-VRC26 (JH3 or Jλ1), and having similar divergence (± 15%) in the V and J genes, similar (± 10%) nucleotide and amino acid divergences in the V gene, and containing a continuous open reading frame throughout the entire variable region, were selected for further processing. Next, reads from all time points were pooled and clustered at 97.25% sequence identity (twice the standard deviation of expected 454 sequencing error)14 using CDHit63. For each cluster, a representative sequence was chosen from the earliest possible time point. The choice of cluster representatives from the earliest time points at which they appeared was critical to maintaining information on the chronology of lineage development in subsequent analyses. This procedure yielded 8,485 unique heavy chain and 6,410 unique light chain sequences.

To identify CAP256-VRC26 lineage-member heavy chains, we performed intra-donor phylogenetic analysis14 on the unique 454 sequence set using the heavy chain sequences of the 12 isolated CAP256-VRC26 antibodies. 707 sequences were identified as likely lineage members, of which 27 were discarded after manual inspection, resulting in a total of 680 unique CAP256-VRC26 lineage heavy chain sequences.

To identify light chain lineage members, a sieve requiring at least 92% sequence identity in CDR L3 to one of the isolated antibodies resulted in 495 sequences. Joinsolver64 was used to examine the V-J junctions of these sequences in detail, to ensure that the recombination points matched those known for the isolated antibodies (Supplementary Fig. 5). This gave a total of 472 unique CAP256-VRC26 lineage light chain sequences.

Paired reads that were identified as members of the CAP256-VRC26 lineage were clustered using CDHit63 at 95% sequence identity and consensus Vh and Vl sequences were generated for each cluster containing two or more pairs. Blast was then used to align the resulting sequences to all clonally related sequences identified from the 454 sequencing as described above. Gapless alignments covering at least 190 nucleotides at 97% or greater sequence identity were considered to be matches. Two of the 157 paired sequences determined to be members of the CAP256-VRC26 lineage matched known CAP256-VRC26 lineage sequences in both Vh and Vl 454 data sets. An additional 4 Vh sequences and 1 Vl sequence were found in the 454 data, but their light or heavy chain partners were not present.

Computation of phylogenetic trees

Phylogenetic trees were constructed from 454 data and the sequences of antibodies isolated from B cell culture. Raw data are shown in Nexus format in Supplementary Figs 12 and 13. MEGA5 (ref. 65) was used to select the general time-reversible model with a gamma-distributed rate parameter (GTR+G)66 as the best mathematical model for building a maximum-likelihood tree from the CAP256-VRC26 lineage sequences. FASTML67 was then used to estimate the gamma parameter and build separate maximum likelihood trees for heavy and light chain sequences (including the isolated antibodies) and these were rooted on the germline V gene sequences. Two branches of the light chain tree were manually moved to match their positioning in the heavy chain tree based on the evidence from trees constructed solely with the 12 isolated antibodies. Analysis with DNAML from PHYLIP (Phylogeny Inference Package) version 3.6 (Felsenstein, J. 2005. PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle) (http://cmgm.stanford.edu/phylip/dnaml.html) showed that these rearrangements did not significantly alter the log-likelihood score of the tree.

To create a condensed version of the heavy chain phylogenetic tree (Fig. 4c), CDR H3 sequences were clustered using a 95% sequence identity threshold and requiring that all CDR H3s in a cluster have the same length. Isolated antibodies and monophyletic clusters with at least five members were represented by a single leaf, while all other sequences were removed from the tree. In cases where an internal node was deleted, branch lengths above and below that node were summed, so that the tree depths of all remaining sequences were maintained.

UCA and inferred intermediates

The phylogenetic trees of all heavy and all light chain lineage members calculated above (Fig. 3c and Extended Data Fig. 1) were input into the DNAML maximum likelihood software package to infer ancestral sequences. These are a direct consequence of the input sequences and the mathematical model used to build the trees; the gamma distribution found by FASTML above was used and the topology of the tree was held fixed, so no further information was added. The calculated heavy chain UCA was identical to the germline VH3-30*18 allele. Although the VH3-30*03 allele is only one nucleotide different from *18, germline sequencing of this donor showed that she carries the *18 allele and not the *03 allele (Cathrine Mitchell Scheepers, personal communication). The inferred UCA is very similar to low-divergence sequences found in the week 38 data set (Extended Data Fig. 6).

To test intermediates in the development of CAP256-VRC26.01, two internal nodes were chosen from the phylogenetic trees to be approximately equally spaced in terms of evolutionary distance and the inferred sequences were retrieved using DNAML. Successful complementation of inferred heavy and light chains for each intermediate suggests that the lineage is well sampled by the 454 data and that the calculated phylogenetic trees successfully capture the coupled evolutionary dynamics of heavy and light chains.

Logograms for CDR H3s were made with Weblogo68.

X-ray crystallography

VRC26.UCA Fab was prepared by digesting purified IgG with Lys-C at 37°C for 2 h. The reaction was then quenched by the addition of cOmplete protease inhibitors (Roche). For VRC26.01, VRC26.03, VRC26.04, VRC26.06, VRC26.07 and VRC26.10 Fab preparation, an HRV3C recognition site (GLEVLFQGP) was inserted after Lys 235 and purified IgG was incubated with HRV3C protease overnight at 4 °C. For all, the digested antibodies were passed over Protein A agarose to remove the Fc fragment. The Fab was further purified over a Superdex 200 gel filtration column and concentrated aliquots were stored at −80 °C. All Fabs were screened against 576 crystallization conditions using a Cartesian Honeybee crystallization robot. Initial crystals were grown by the vapour diffusion method in sitting drops at 20 °C by mixing 0.2 μl of protein complex with 0.2 μl of reservoir solution. Crystals were manually reproduced in hanging drops by mixing 1.0 μl protein complex with 1.0 μl reservoir solution. VRC26-UCA was crystallized with a reservoir solution of 27% PEG 8000 and 0.1 M HEPES pH 7.5 and was flash frozen in liquid nitrogen with 20% PEG 400 as a cryoprotectant. VRC26.01 was crystallized with a reservoir solution of 32% PEG 400, 4% PEG 3350 and 0.1 M sodium acetate pH 5.5 and was flash frozen in liquid nitrogen with 20% ethylene glycol as a cryoprotectant. VRC26.03 was crystallized with a reservoir solution of 22% PEG 8000, 5% MPD and 0.1 M imidazole pH 6.5 and was flash frozen in liquid nitrogen with 20% xylitol as a cryoprotectant. VRC26.04 was crystallized with a reservoir solution of 14% PEG 3350, 25% ispropanol and 0.1 M Tris pH 8.5 and was flash frozen in liquid nitrogen with 20% ethylene glycol as a cryoprotectant. VRC26.06 was crystallized with a reservoir solution of 3 M sodium formate and 0.1 M Tris pH 7.5 and was flash frozen in liquid nitrogen with 20% xylitol as a cryoprotectant. VRC26.07 was crystallized with a reservoir solution of 4% PEG 8000, 0.1 M zinc acetate and 0.1 M MES pH 6 and was flash frozen in liquid nitrogen with 20% glycerol as a cryoprotectant. VRC26.10 was crystallized with a reservoir solution of 22% PEG 4000, 0.4M sodium acetate and 0.1 M Tris pH 7.5 and was flash frozen in liquid nitrogen with no cryoprotectant.

Data for all crystals were collected at a wavelength of 1.00Å at SER-CAT beamlines ID-22 and BM-22 (Advanced Photon Source, Argonne National Laboratory). All diffraction data were processed with the HKL2000 suite69 and model building and refinement were performed in COOT70 and PHENIX71, respectively. For VRC26.03 Fab data, a molecular replacement solution consisting of one Fab molecule per asymmetric unit was obtained using PHASER with a search model from PDB ID 3F12. VRC26.03 then served as a search model for all remaining VRC26 Fabs. Throughout the refinement processes, a cross validation (Rfree) test set consisting of 5% of the data was used and hydrogen atoms were included in the refinement model. Structure validations were performed periodically during the model building/refinement process with MolProbity72. Ribbon diagram representations of protein crystal structures were made with PyMOL73 and electrostatics were calculated and rendered with UCSF Chimera74.

Structure modelling on trimers

Defined locations of the V1V2, V3-glycan and CD4-binding sites were mapped directly onto EM density of the unliganded HIV-1 BAL spike (EMD-5019)50 using the software package UCSF Chimera74. The CD4-binding site was defined by aligning density of the VRC01-bound BAL spike (EMD-5457)75 with the unliganded map and fitting a crystal structure of VRC01-bound gp120 (PDB accession number 3NGB)76 to the density. EM density in close proximity to the Fab structure was colored to highlight the region of contact. The same procedure was used to define the V3-glycan region using a PGT128-bound trimer (EMD-1970) and crystal structure (PDB id 3TYG)77 and the V1V2 region using the PG9-bound BG505 SOSIP trimer (EMD-2241)24 and a crystal structure of V1V2-bound PG9 (PDB accession number 3U4E)22. The fit of the PG9-V1V2 crystal structure to the SOSIP trimer was used to model the trimeric orientation of V1V2 using the threefold symmetry of the HIV-1 spike. The BG505.664 SOSIP crystal structure33, PDB 4NCO, was presented to highlight the quaternary location of V1V2 point mutations. Side chains of residues 166 and 167, not seen in the crystal structure, were modelled. The Man5 glycan at N160, also not seen in the crystal structure, is represented as in the crystal structure of the PG9-V1V2 complex (PDB accession code 3U4E).

Loop modelling

Two intermediates were calculated at approximately equal maturation distance along the VRC26-UCA to VRC26.01 pathway. Mutations associated with the intermediates were mapped directly onto the structure of VRC26.01. 14 of the 35 residues in the VRC26.01 structure are disordered and were modelled with Loopy78 (http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:Loopy) and represented as grey dots. Mutations of the intermediates were coloured according to approximate time of occurrence based on the longitudinal phylogenetic tree highlighting the timeline of the structural development. These, and the other antibodies with modelled loops (Fig. 4), were modelled in a single loop prediction involving four steps. In the first step, Loopy was used to predict 10 loop conformations. The number of initial loop conformations to be sampled was set to 50,000 (and the not the default value of 2,000). In the second step, all 10 loop conformations were refined using the Protein Preparation Wizard in Maestro (http://www.schrodinger.com/). In the third step, sulphate groups were added to tyrosine at position 100 of the heavy chain and the entire structure was then subjected to all-atom energy minimization in Maestro. A fourth and final step was needed to ensure a reasonable sampling of the rotameric states for the sulphated tyrosines. The Rapid Torsion Scan module in Maestro was used to sample the chi angle involving the sulphate moiety in steps of 20 degrees. The model with the lowest energy after application of the Rapid Torsion Scan module was considered as the best prediction.

Tyrosine sulphation predictions were carried out in GPS-TPS (Z. Pan et al., http://tsp.biocuckoo.org).

Single genome amplification (SGA), sequencing and cloning

HIV-1 RNA was isolated from plasma using the Qiagen QIAamp Viral RNA kit, and reverse transcribed to cDNA using SuperScript III Reverse Transcriptase (Invitrogen, CA). The envelope genes were amplified from single genome templates49 and amplicons were directly sequenced using the ABI PRISM Big Dye Terminator Cycle Sequencing Ready Reaction kit (Applied Biosystems, Foster City, CA) and resolved on an ABI 3100 automated genetic analyser. The full-length env sequences were assembled and edited using Sequencher v.4.5 software (Genecodes, Ann Arbor, MI). Multiple sequence alignments were performed using Clustal X (ver. 1.83) and edited with BioEdit (ver. 7.0.9) Sequence alignments were visualized using Highlighter for Amino Acid Sequences v1.1.0 (beta).

For analysis of selection pressure, and to account for recombination between the SU and PI, sequences were partitioned into two alignments (an SU-related, and a PI-related alignment) based on the inferred recombination breakpoints using an in-house script. Breakpoints were identified by a shift in identity from one reference towards the other, and required at least two sequential polymorphisms in common with a corresponding PI/SU-related virus in order to be considered. Phylogenies for both alignments were then reconstructed using FastTree79 with a GTR+CAT model, and rooted on the PI/SU. Signals of selective pressure were detected with MEME (episodic diversifying selection)80 and DEPS (directional selection)81 using the FastTree-generated trees, implemented in Hyphy82.

The frequencies of specific amino acids at a site and the distribution of net charges in the V2 epitope were calculated from the 2012 filtered web alignment (n = 3,990) from the Los Alamos HIV database (http://www.hiv.lanl.gov/).

Selected envelope amplicons were cloned into the expression vector pcDNA 3.1 (directional) (Invitrogen) by re-amplification of SGA first-round products using Pfu Ultra II enzyme (Stratagene) with the EnvM primer, 5′-TAGCCCTTCCAGTCCCCCCTTTTCTTTTA-3′ (ref. 83) and directional primer, EnvAstop, 5′-CACCGGCTTAGGCATCTCCTATGGCAGGAAGAA-3′ (ref. 48). Cloned env genes were sequenced to confirm that they exactly matched the sequenced amplicon. Autologous clones were mutated at key residues within the C-strand using the Stratagene QuickChange II kit (Stratagene) as described by the manufacturer. Mutations were confirmed by sequencing. Envelope clones were used to generate single round of replication Env-pseudoviruses as described above.