Inferring HIV-1 transmission networks and sources of epidemic spread in Africa with deep-sequence phylogenetic analysis

Ratmann, Oliver; Grabowski, M. Kate; Hall, Matthew; Golubchik, Tanya; Wymant, Chris; Abeler-Dörner, Lucie; Bonsall, David; Hoppe, Anne; Brown, Andrew Leigh; de Oliveira, Tulio; Gall, Astrid; Kellam, Paul; Pillay, Deenan; Kagaayi, Joseph; Kigozi, Godfrey; Quinn, Thomas C.; Wawer, Maria J.; Laeyendecker, Oliver; Serwadda, David; Gray, Ronald H.; Fraser, Christophe

doi:10.1038/s41467-019-09139-4

Download PDF

Article
Open access
Published: 29 March 2019

Inferring HIV-1 transmission networks and sources of epidemic spread in Africa with deep-sequence phylogenetic analysis

Nature Communications volume 10, Article number: 1411 (2019) Cite this article

7947 Accesses
40 Citations
30 Altmetric
Metrics details

Subjects

Abstract

To prevent new infections with human immunodeficiency virus type 1 (HIV-1) in sub-Saharan Africa, UNAIDS recommends targeting interventions to populations that are at high risk of acquiring and passing on the virus. Yet it is often unclear who and where these ‘source’ populations are. Here we demonstrate how viral deep-sequencing can be used to reconstruct HIV-1 transmission networks and to infer the direction of transmission in these networks. We are able to deep-sequence virus from a large population-based sample of infected individuals in Rakai District, Uganda, reconstruct partial transmission networks, and infer the direction of transmission within them at an estimated error rate of 16.3% [8.8–28.3%]. With this error rate, deep-sequence phylogenetics cannot be used against individuals in legal contexts, but is sufficiently low for population-level inferences into the sources of epidemic spread. The technique presents new opportunities for characterizing source populations and for targeting of HIV-1 prevention interventions in Africa.

A cross-sectional study to characterize local HIV-1 dynamics in Washington, DC using next-generation sequencing

Article Open access 06 February 2020

Characterisation of HIV-1 Molecular Epidemiology in Nigeria: Origin, Diversity, Demography and Geographic Spread

Article Open access 26 February 2020

Molecular epidemiology of the HIV-1 epidemic in Fiji

Article Open access 06 March 2024

Introduction

Large generalized epidemics of human immunodeficiency virus type 1 (HIV-1) continue to cause substantial mortality and morbidity across much of sub-Saharan Africa¹. Rates of new infections have been reduced by adoption of prevention measures, especially antiretroviral therapy and medical male circumcision^1,2. Despite progress, incidence levels remain well above elimination thresholds³. There remains an urgent need to better understand the drivers of transmission such as differential transmission by sex and age groups, especially among young women who account for 74% of new infections among adolescents in sub-Saharan Africa⁴. This may enable better targeting of prevention measures to infected people who most likely act as sources of new infection, and thus reduce transmission amongst groups most likely to sustain the epidemic. HIV-1 evolves faster than transmissions occur, so that viral sequences obtained from an individual tend to be characteristic of that individual within weeks after infection^5,6. Therefore, viral genetic data have the potential to yield novel insights into the drivers of transmission by identifying who may have been a transmitter, and then by generalizing these findings to identify risk factors that can be directly targeted for prevention^7,8.

Currently, phylogenetic tools to identify sources of transmission are based on Sanger sequencing, which generates a single HIV-1 consensus sequence per virus sample from an individual^{9,10,11,12,13}. Typically one sample per individual is sequenced, and so the entire viral population from one individual is reduced into a single consensus sequence, which is insufficient to determine in which direction infections occurred¹⁴. For this reason source attribution methods have required data on dates of infection^15,16,17 or modelling assumptions on the epidemic^{9,10,12,18,19}. An advantage of source attribution methods based on additional modelling assumptions is that they may be applied with relatively small sample sizes, although it can be hard to disentangle assumptions from conclusions. For example, in ref. ¹², it was assumed that young women are predominantly infected by older men in KwaZulu-Natal, South Africa, and it is unclear to what extent the same conclusion is based on data²⁰. There is consequently a need for broadly applicable source attribution methods that are not dependent on external modelling assumptions to provide independent evidence.

Here, we demonstrate that HIV-1 transmission networks and the direction of transmission within them can be reconstructed from deep-sequence data of a large population-based sample of infected individuals with phyloscanner²¹, a recently developed software package for viral phylogenetic inference from deep-sequence data. The accuracy in reconstructing the direction of transmission is sufficient to infer source populations, i.e. the most likely drivers of the epidemic, without assumptions on the epidemic. This finding turns into practice the theoretical prediction by Romero-Severson et al.²² that individuals should be represented by clusters (in short: subgraphs) of viral sequences in phylogenies when many sequence reads per individual are available, and that the phylogenetic ordering of subgraphs should allow inference of the likely direction of transmission between individuals. Figure 1 illustrates this principle. Leitner and Romero-Severson²³ investigated which phylogenetic orderings of subgraphs (in short: subgraph topologies) can be expected among known transmission pairs. The primary aim of this study is the opposite, to establish what epidemiologic inferences can be made from observed patterns in deep-sequence phylogenies. Our population-level analysis is based on deep-sequence data that was cross-sectionally collected from 40 communities in the Rakai region of Southern Uganda. Rakai communities are predominantly small agrarian and semi-urban trading centres as well as fishing communities alongside Lake Victoria. The area was the initial epicentre of the HIV-1 epidemic in Eastern Africa, and today remains among the highest burdened districts in Uganda with an overall adult HIV prevalence that ranges from 9–26% among inland trading and agrarian communities to 38–43% among lakeside fishing communities^24,25.

We report first that it is feasible to obtain population-based samples of HIV-1 deep-sequence data that represent a large proportion of infected individuals with unsuppressed virus in a local setting in Africa. Second, we demonstrate that deep-sequence phylogenetic analysis can be scaled from pairs in whom transmission has been suspected to population-based samples of HIV-1 epidemics. We reconstruct partial transmission networks in the absence of self-reported sexual contact information and identify pairs of individuals in whom transmission and the direction of transmission is phylogenetically inferred with high statistical support, which we call source−recipient pairs. Third, we assess the strength of deep-sequence phylogenetic inferences on direct transmission between two individuals (in short: linkage) in a large population-based sample, and the direction of transmission between two individuals via potentially unsampled intermediates. Our major finding is that the direction of transmission from a source case to a recipient could be frequently estimated with high statistical support, and that accuracy levels are sufficient for inferences into the drivers of epidemic spread at the population-level.

Results

Large deep-sequence data set of an African HIV-1 epidemic

Between August 2011 and January 2015, 25,882 individuals aged 15–49 years were surveyed in 40 communities of the Rakai Community Cohort Study (RCCS) in Uganda (Table 1). The survey included the four largest fishing sites along Lake Victoria because of their high population-level HIV prevalence (~40%)²⁵ and hypothesized role in epidemic spread. 5142 participants were HIV-positive. Reflecting previous guidelines on initiation of antiretroviral therapy (ART) during the observation period, 3878 (75.4%) infected study participants reported no ART use at time of survey. Self-reported ART use was previously validated as a proxy for actual ART use²⁶, and 90% of individuals who reported using ART also had suppressed virus titres below 1000 copies per millilitre plasma blood². This prompted us to focus on viral sequencing among individuals who did not report ART use. Deep-sequencing of the virus genomes was performed on 3758/3878 (96.9%) samples using the Gall et al. protocol²⁷, generating thousands of short viral sequence fragments (reads) per individual. Sequencing success was comparatively modest²⁸. We restricted our analysis to samples from 2652 individuals that satisfied minimum criteria on read length and depth for phylogeny reconstruction and subsequent inferences (see Methods and Supplementary Figure 1). Women and individuals of 35 years or more were under-represented in this data set when compared to infected participants, whereas individuals in fishing sites were over-represented. The overall sequence sampling fraction was high, 68.4% (2652/3878) among infected participants who did not report ART use (Fig. 2), and an estimated 65.6% (2652/4043) among infected participants with unsuppressed virus (see Methods). If we assume that individuals who were not present or did not participate at survey visits were infected with unsuppressed virus in proportion to the enrolled population, an additional 1837 individuals likely did not have suppressed viraemia, leading to an estimated sequence sampling fraction of 45.1% (2652/5880) among eligible, infected individuals with unsuppressed virus. Accounting for the previous finding that ~30% of individuals were infected by a person outside the cohort¹¹, we thus expect that in approximately three of ten cases (0.451 × 0.7), our data contain the transmitter of a sequenced individual.

Table 1 Characteristics of the study population

Full size table

Scaling deep-sequence phylogenetics to large data sets

We first investigated the types of deep-sequence phylogenetic patterns that arise in known epidemiologic relationships. Our population-based sample comprised 331 concordant HIV-1-positive couples who self-identified as sexual partners. Based on previous partner analyses^16,17, we expected that virus was transmitted in approximately 70% of couples, and that the remaining couples were separately infected by other individuals. Figure 1d illustrates a typical scan of deep-sequence phylogenies across the genome for three male−female pairs. In each phylogeny, subgraphs of reads from two individuals could either be ancestral to each other (pink if virus of the female was ancestral and blue if virus of the male was ancestral), siblings (purple), intermingled (yellow), or disconnected by one or more other individuals (grey, see Methods for full definitions and Supplementary Tables 1–3 for command line specifications of the phyloscanner software). In addition, the shortest patristic distance between subgraphs of reads from two individuals (in short: subgraph distance) reflected genetic similarity of their viruses (y-axis). Figure 3a summarizes these deep-sequence phylogenetic patterns across known couples. We found, first, that the distribution of subgraph distances separating partners was bimodal (Fig. 3a, showing the median distance per pair across all their phylogenies after standardizing for differences in evolutionary rates across the genome). Most couples were either phylogenetically closely related or distantly related, with intermediate distances being very rare. This suggested that transmission likely occurred among phylogenetically closely related couples, and allowed us to define distance thresholds below which transmission was likely and above which transmission could be ruled out in this population (respectively <0.025 substitutions per site and >0.05 substitutions per site, see Fig. 3a). Additional analysis of whole-genome consensus sequences further supported these findings and thresholds (Supplementary Note 2 and ref. ²⁹). Second, we found that the large majority (166/178, 93.3%) of phylogenetically close couples also had ancestral subgraphs in most deep-sequence phylogenies, indicating in line with Leitner and Romero-Severson²³ that ancestral subgraph topologies are strongly over-represented among true transmission pairs.

Crucially, molecular epidemiologic analyses aim to infer unknown epidemiologic relationships from observed phylogenetic patterns in a population-based sample. This is a harder analytical problem compared to characterizing phylogenetic patterns among known epidemiologic relationships as in Fig. 3a, because only a tiny proportion of all pairs of individuals in a population-based sample are transmission pairs. We calculated the same phylogenetic patterns among all 3,515,226 possible pairs in our sample of 2562 individuals (see Methods), and summarized them in Fig. 3b as for the couples. With the exception of the 331 couples, sexual contacts were not known among any other of the ~3.5 m possible pairs. We found that ancestral subgraph topologies centred among pairs who were phylogenetically close: of 814 pairs with mostly ancestral subgraphs, 694 (85.3%) had phylogenetically close virus below our threshold for likely direct transmission (0.025 substitutions per site). However, 48 (5.9%) pairs had divergent virus above our threshold for ruling out direct transmission (0.05 substitutions per site). In addition, ancestry missed 118 (14.5%, 118/(694 + 118)) phylogenetically close pairs that had intermingled or sibling subgraphs in most of their deep-sequence phylogenies. Therefore, we used all types of subgraph topologies in combination with subgraph distance for inference of transmission networks from deep-sequence data. It is possible to approximate the likelihood of deep-sequence phylogenetic patterns under mathematical models of within-host viral evolution and transmission³⁰. However, such models do not fully reproduce empirical observations such as preferential transmission of founder viruses³¹, and can be computationally prohibitive at large scales. For these reasons we adopted a statistical approach that is based on counting phylogenetic patterns across the genome, and calculating the proportion of deep-sequence phylogenies in support of no linkage $({\hat \mu _{ij}})$, linkage $({\hat \lambda _{ij}})$, and direction of transmission given linkage $({\hat \delta _{ij}})$; see Fig. 4 and Methods. Starting with subgraph distance, direct transmission could be ruled out for 3,513,800/3,515,226 (99.96%) pairs, leaving only 1426 potential transmission pairs. Next, we also considered information in subgraph topologies. This left 1191 potential transmission pairs that formed 446 transmission networks in the population-based sample of 2562 individuals, i.e. groups of individuals that had predominantly phylogenetically close and topologically adjacent (ancestral, intermingled or sibling) subgraphs.

Unlike typical phylogenetic clusters^11,12,32,33, these transmission networks contained information on the direction of transmission (Fig. 5). Two hundred and sixty-one networks comprised just two individuals, while 36 had more than five individuals. As expected given the uncertainty in our inferences, larger networks included cycles of possible transmission flows and recipients with more than one probable source case, implying that multiple transmission chains were consistent with our phylogenetic data. We next identified the most likely transmission chains using graph theory (see Methods). This retained 888 phylogenetic linkages in 446 most likely transmission chains, of which 351 linkages had low statistical support ($\hat \lambda _{ij} \le 0.6$, see Fig. 4 and Methods for choice of threshold) and 537 linkages had high statistical support $({\hat \lambda _{ij} \hskip 1.5pt > \hskip 1.5pt 0.6})$.

Viral deep-sequence data cannot prove HIV-1 transmission

We hypothesized that many of the 537 highly supported phylogenetic linkages were false discoveries in that transmission did not occur directly between the paired individuals. Our population-based sample did not capture all members of ongoing transmission chains, and so transmission likely occurred via unsampled intermediates in some cases. 80/537 (14.9%) of highly supported phylogenetic linkages were between two women even though HIV-1 is predominantly sexually transmitted in Africa, and extremely rarely transmitted sexually between women³⁴. Considering that there were almost twice as many possible male−female combinations than female−female combinations, we calculate in Supplementary Note 3 that up to 35.4% of phylogenetically close male−female pairs of the population-based sample may not represent direct transmission events. Figure 4b illustrates this fundamental problem further: subgraph distances and topologies were not sufficient to clearly separate pairs of individuals from the population sample into two groups of closely related or distantly related pairs.

In prior work, Romero-Severson et al.²² proposed that direct transmission can be established with near certainty when viral sequences from two individuals are heavily intermingled in deep-sequence phylogenies. This prediction, while based on theoretical evolutionary principles and simulation, implies that deep-sequence phylogenies could be used in criminal cases of HIV-1 transmissions, and thus has important public health and human rights implications.

We revisited this hypothesis in our data, and found 34 phylogenetically close pairs with intermingled subgraphs across the majority of the genome. In two instances, the phylogenetically linked individuals were female (Fig. 6, corresponding deep-sequence phylogenies are reported in Supplementary Data 1), suggesting they were likely infected by a common unobserved male partner. Based on this, the phylogenetic linkages in transmission networks that we inferred from our deep-sequence data may indicate—but cannot prove—direct transmission. The difference between the theoretical expectations of Romero-Severson et al.²² and our observations may be explained by limited phylogenetic resolution in our reads, or may reflect greater complexity in HIV-1 evolutionary dynamics³⁵.

These findings put into context that 81 (15.1%) of the 537 highly supported phylogenetic linkages were between two men. Given that the relative proportion of same-sex linkages were equivalent between men and women, our phylogenetic transmission networks provide no evidence of extensive sub-epidemics amongst men who have sex with men in rural Rakai although we cannot rule out the possibility that these may exist due to potential undersampling of widely stigmatized key populations³⁶.

The direction of transmission can be frequently inferred

We further analysed the remaining 376 highly supported male−female linkages to infer the direction of transmission (i.e. who might have infected whom, potentially via unsampled intermediates). Amongst the population-based sample, we inferred the phylogenetically likely source for 293/376 (77.9%) of linked male−female pairs (Fig. 5, $\hat \delta _{ij} \hskip 1.5pt > \hskip 1.5pt 0.6$, see Methods for choice of thresholds). In comparison, 176/376 (46.8%) of highly supported male−female linkages were between couples, and the phylogenetically likely source could be inferred in 133/176 (75.6%) couples. Inferences of these source−recipient pairs did not depend strongly on our cut-off choices (Supplementary Table 4).

Inferring the direction of transmission has a small error

We cross-validated our findings on the direction of transmission using HIV-1 testing history and clinical data that provided independent evidence that one direction of transmission was much more likely than the other. In 36 pairs (18 couples and 18 pairs between whom sexual contact was not known), one individual tested HIV-1 negative after the other had already tested positive, and the negative individual subsequently seroconverted. The phylogenetically inferred source ($\hat \lambda _{ij} \hskip 1.5pt > \hskip 1.5pt 0.6$ and $\hat \delta _{ij} \hskip 1.5pt > \hskip 1.5pt 0.6$) was consistent with clinical evidence in 27/31 pairs, inconsistent in 4/31 pairs, and could not be inferred reliably in 5/31 pairs (Table 2; corresponding deep-sequence phylogenies are reported in Supplementary Data 2). The false discovery rate for estimating the direction of transmission amongst pairs with epidemiologically known direction of transmission was therefore 12.9% with 95% confidence interval [5.1–28.9%].

Table 2 Error rates in inferring the direction of HIV-1 transmission

Full size table

In 35 pairs, one individual had a CD4 cell count above 800 cells per mm³blood, indicative of being close to time of infection, while their partner was already immuno-compromised with a CD4 cell count below 400 cells per mm³ blood. The phylogenetically inferred source was consistent with clinical evidence in 19/35 pairs, inconsistent in 5/35 pairs, and could not be inferred reliably in 11/35 pairs. In two of the five inconsistent cases, CD4 data were only weakly indicative of the direction of transmission, and it is possible that we overestimated error rates for these pairs with CD4 data to 20.8% [9.2–40.5%] (Supplementary Note 4).

Amongst all pairs, the false discovery rate was 16.3% [8.8–28.3%]. Error rates varied slightly depending on the exact configuration of parameters in the phyloscanner analyses, though not substantially (Supplementary Tables 5–6). Similar error rates were observed in phylogenetic analysis of 454 deep-sequence data over a 320 bp region of the env gene among 33 couples with known direction of transmission and confirmed linked infection in the HPTN 052 trial³⁷. Our findings are based on deep-sequencing of a population-based sample, and thus extend previous results to population-level inferences among individuals between whom sexual contact is not necessarily known a priori.

Discussion

A central application of pathogen sequencing is to identify how infectious diseases continue to spread in human populations, and how new infections can be averted most effectively^38,39,40,41. Most molecular epidemiologic studies are based on analysis of Sanger sequences, and typically identify clusters of genetically related infections in an effort to characterize ongoing transmission sources^11,32,33,42. These approaches fail to distinguish sources from recipients of transmission within such clusters, making epidemiological inferences relevant to public health intervention challenging⁷. In contrast, deep-sequence phylogenetic analyses are based on thousands of reads per individual, and thereby provide more information into the epidemiologic relationship of individuals beyond distance measures, through the topological ordering between subgraphs of viral reads from individuals. Prior work assessed the potential of deep-sequence phylogenetic analyses on simulations and on known transmission pairs for whom at least five viral sequences were available per individual^22,23,43. Here, we demonstrate that large population-based samples of standard deep-sequence output can be used to infer directed transmission networks of generalized HIV-1 epidemics in sub-Saharan Africa with phyloscanner²¹. Combining the patristic distance between viral subgraphs and their topological ordering in deep-sequence phylogenies, our analysis uncovered 446 partially sampled HIV-1 transmission networks in Rakai comprising 1334 individuals.

We were not able to rule out the possibility that sources were indirectly linked to recipients through unobserved individuals (i.e. intermediate partners) with deep-sequence phylogenetic analysis. One third (161/537) of phylogenetically highly supported linkages were between individuals of the same gender, in line with incomplete sequence coverage. We also found two pairs with phylogenetic patterns previously considered strong enough to virtually exclude the possibility of common sources or recipients, but in whom both individuals were female. These findings have important implications for criminal prosecution of people living with HIV in at least 72 countries with laws penalizing HIV transmission^14,44: even with deep-sequencing, transmission of HIV-1 cannot be proven between two individuals. Thus, communicating the limitations of deep-sequencing data is essential to prevent its misuse in criminal prosecutions. For example, we opted to visually interrupt linkages in phylogenetic transmission networks (Fig. 5), in order to highlight the possibility of unsampled cases along inferred source−recipient relationships.

We found that when many reads from different individuals are analysed together, they tend to form subgraphs with consistent ordering in deep-sequence phylogenies from across the genome. This observation enabled us to infer the source of transmission in 77.9% of 376 phylogenetically linked male−female pairs. The accuracy of our viral phylogenetic inferences regarding directionality was validated on 71 male−female pairs with clinical data that suggested transmission in one direction, with an overall false discovery rate of 16.6% [9.1–28.7%], and was thus not substantially different in a population-based sample compared to analysis of couples with known direction of transmission³⁷. At this error rate, phyloscanner and similar approaches^21,37,43 allow inferences into population-level transmission networks and the epidemiologic sources of ongoing viral spread from sequence data alone.

Our study has several weaknesses. First, sequence sampling of the infected population in RCCS communities remained incomplete. Phylogenetic inferences are expected to improve with higher sampling fraction⁴⁵, though in practice, complete sequence sampling is hard to achieve. This study enrolled participants before immediate provision of ART was recommended in national guidelines, so that a relatively large proportion of infected individuals did not report ART use at first study visit, and could be sequenced. To perform similar phylogenetic analyses of ongoing viral spread in sub-Saharan Africa in the future, it is thus important to collect and store samples prior to ART initiation, and to investigate alternative sequencing protocols⁴⁶. Second, relatively modest deep-sequencing quality compromised the length of deep-sequence reads²⁸. Analyses were based on relatively short read alignments of 250 bp that primarily covered the gag gene, rather than the whole genome (Supplementary Figure 1). It is thus plausible that deep-sequence phylogenetic analyses may be more accurate than reported in this study as deep-sequence output with longer reads and greater coverage is becoming available⁴⁷. Third, we found that inferring the direction of transmission became more challenging as the virus was increasingly closely related within individuals. We thus predict that the direction of transmission may be less frequently inferable in situations when the virus spreads more rapidly between persons, as in high-risk sexual networks among men having sex with men^9,15, or among injecting drug users⁴⁸. For the same reason, sources of infections may be less accurately and/or less frequently inferable for pathogens that generate within-host viral diversity at a slower pace than HIV-1 ^39,49,50.

Whole-genome deep-sequencing is now the tool of choice in clinical practice and epidemiologic investigation for a broad range of bacterial infectious disease pathogens, and increasingly used for viral pathogens, and especially HIV-1 ^{8,38,39,49,50}. Here we establish that HIV-1 phylogenetic analyses can be scaled to large population-based samples of deep-sequence data, and that the direction of transmission can be frequently inferred in reconstructed HIV-1 transmission networks. At present, more than 15,000 individuals have been deep-sequenced and linked to demographic records across sub-Saharan Africa in order to understand who is at the core and driving new infections where the burden of HIV-1 is highest, how the epidemic regenerates from older to younger generations, and how spread can be most effectively interrupted in generalized epidemics^7,8. The phyloscanner method is applicable to these data, and we hypothesize that this innovation will help identify the key drivers of HIV-1 transmission in regions that are hardest hit by the virus, and in turn facilitate tailoring of interventions to achieve epidemic control.

Methods

Sample selection

Data for this study come from the Rakai Community Cohort Study (RCCS), a population-based study of HIV-1 incidence in Rakai, District Uganda. Procedures for the RCCS have been described in detail elsewhere². Briefly, the RCCS conducts a census in all communities to identify eligible individuals 2 weeks before the survey. Eligible individuals include those able to give consent and between the ages of 15 and 49 years. Eligible individuals who provide written informed consent are administered a survey on their demographs, sexual behaviours and health-care seeking practices. Individuals are also asked to name their cohabitating sexual partners in order to identify couples, and to provide a serum sample for HIV-1 testing and future laboratory studies, including HIV-1 viral sequencing. Data for this particular study were collected between 2011 and 2015 from 40 agrarian, trading and fishing communities.

Ethics

The study was independently reviewed and approved by the Ugandan Virus Research Institute, Scientific Research and Ethics Committee, Protocol GC/127/13/01/16; the Ugandan National Council of Science and Technology; and the Western Institutional Review Board, Protocol 200313317. All study participants provided written informed consent at baseline and follow-up visits using institutional review board-approved forms.

Sampling fraction

To estimate the number of infected participants with unsuppressed virus, we first calculated the expected number of infected participants who did not use antiretrovirals at time of survey, and had thus unsuppressed virus. Participant reported ART use was previously validated as a proxy of actual ART use with a specificity of 99%²⁶, giving 3878/0.99 individuals. To this, we added the expected number of participants who reported ART use but did not have suppressed virus. Ten per cent of participants reporting ART use had plasma viral loads above 1000 copies/ml plasma blood², giving 1264 × 0.9 individuals, and 4043 in total. The sampling fraction was therefore estimated at 2652/4043 (65.6%) among infected participants with unsuppressed virus.

HIV-1 deep-sequencing

Serum samples from HIV-1 seropositive persons who did not self-report ART use over the analysis period were shipped to University College London Hospital, London, United Kingdom for viral RNA extraction. RNA extraction was automated on QIAsymphony SP workstations with the QIAsymphony DSP Virus/Pathogen Kit (Cat. No. 937036, 937055; Qiagen, Hilden, Germany), followed by one-step reverse transcription polymerase chain reaction (RT-PCR)²⁷. Deep-sequencing was performed on Illumina MiSeq and HiSeq instruments in the DNA pipelines core facility at the Wellcome Trust Sanger Institute, Hinxton, United Kingdom.

Assembly of HIV-1 reads

Deep-sequencing reads were assembled with the shiver sequence assembly software⁵¹. Where no contigs could be generated with IVA⁵², contigs were generated with SPAdes and metaSPAdes v3.10 ^53,54, after excluding reads classified as Homo sapiens by Kraken v0.10.5-beta⁵⁵. Contigs with at least 300 bp matching known HIV-1 diversity were used for shiver analysis.

Read selection

Phyloscanner version 1.1.2 ²¹ was used to merge paired-end reads, and only merged reads of at least 250 bp in length were retained for phylogeny reconstruction. Subsequent deep-sequence inferences were performed on individuals whose reads covered the HIV-1 genome at a depth of at least 30 reads for 750 bp or more. Individuals who did not have sequencing output meeting these criteria were excluded.

Deep-sequence phylogenetic analysis

It proved computationally intractable to reconstruct viral trees from all deep-sequence reads of all individuals simultaneously. To address this challenge, samples were divided into batches of 50−75 individuals, and phyloscanner was run on all possible pairs of batches to assess deep-sequence phylogenetic relationships in all pairs of individuals in the population-based sample. The phyloscanner command line specification for this first analysis stage is given in Supplementary Tables 1 and 2. Shell scripts were used to handle calculations in parallel, and are available upon request. From stage 1 output, we identified potentially phylogenetically close pairs and, from those, networks of pairs that were connected through at least one common, phylogenetically close individual. Networks were extended to include spouses of partners in networks, couples in no network, and the ten most closely related individuals from stage 1 as controls. For computational considerations, reads of individuals that differed at one nucleotide position were merged. In a second analysis stage, phyloscanner was used to confirm potential transmission pairs by considering also the topological configuration of subgraphs in deep-sequence phylogenies, and to resolve the ordering of transmission events within transmission networks. The phyloscanner command line specification for stage 2 is given in Supplementary Table 3. In this stage, reads of individuals that differed at one nucleotide position were not merged.

Phylogenetic relationships of virus from two individuals

The basis of viral phylogenetic analysis with phyloscanner are subgraphs, sets of tips and internal nodes of a phylogeny that are attributed to one individual with a parsimony-based algorithm²¹. A single individual can have multiple subgraphs in one tree. The following statistics were calculated to characterize the phylogenetic relationship between two individuals i and j in one phylogeny:

Subgraph distance between i and j (Δ_ij): The distance between any two subgraphs u, v is the shortest patristic distance between any nodes or tips of u and v and Δ_ij is the minimum patristic distance between subgraphs u from i and v from j. Deep-sequence phylogenies from different parts of the genome had markedly different branch lengths, reflecting evolutionary rate variation across the genome. Prior to calculating subgraph distances, we standardized phylogenies by multiplying branch lengths with the ratio of expected branch lengths in the genomic window from which the tree was reconstructed, divided by the expected branch lengths in the gag and polymerase genes (Supplementary Table 2).
Adjacency of i and j (A_ij): True if the shortest path between at least one subgraph u from i and v from j is not attributed to any sampled individual other than i and j, and false otherwise.
Paths from i to (P_ij): number of subgraphs from j which have as ancestor a subgraph from i.

Analyses were then based on the following phylogenetic relationship types between two individuals i and j in a viral tree:

Phylogenetically unlinked (U_ij): A_ij = 0 or Δ_ij > 0.05 substitutions per site.
Phylogenetic linkage grey zone (G_ij): A_ij = 1 and Δ_ij ∈ [0.025−0.05 substitutions per site].
Phylogenetically linked and i source (i → j): A_ij = 1 and P_ij ≥ 1 and P_ji = 0 and Δ_ij < 0.025 substitutions per site.
Phylogenetically linked and j source (j → i): A_ij = 1 and P_ji ≥ 1 and P_ij = 0 and Δ_ij < 0.025 substitutions per site.
Phylogenetically linked with no evidence for direction of transmission (i ~ j): A_ij = 1 and P_ji ≥ 1 and P_ij ≥ 1 and Δ_ij < 0.025 substitutions per site (intermingled), or A_ij = 1 and P_ji = 0 and P_ij = 0 and Δ_ij < 0.025 substitutions per site (sibling).

Evidence for transmission and direction of transmission

To capture uncertainty in inferences, relationship types between reads from two individuals were evaluated on a large number of deep-sequence phylogenies that corresponded to sliding and overlapping read alignments (as shown in Fig. 1d). For each pair of individuals, the number of deep-sequence phylogenies in which i and j had one of the above five relationship types were counted (as shown in Fig. 4). The raw counts were adjusted for overlap in read alignments from which the deep-sequence phylogenies were constructed as described in Supplementary Note 1, and are denoted by k_U (unlinked), k_G (grey zone), k_i $_{\rightarrow}$ _j (i source), k_j $_{\rightarrow}$ _i (j source), k_i ~ j (no evidence for direction). After adjusting for overlap, the counts were interpreted as phylogenetic independent observations, leading to Binomial probability models for each count. Evidence for direct transmission (λ_ij) was based on the count k_L = k_i $_{\rightarrow}$ _j + k_j $_{\rightarrow}$ _i + k_i ~ j ≥ 0, and binomial model (likelihood)

$$p\left( {k_{\mathrm L},n{\mathrm{|}}\lambda _{ij}} \right) = \frac{{{\mathrm{\Gamma }}(n + 1)}}{{{\mathrm{\Gamma }}(k_{\mathrm L} + 1){\mathrm{\Gamma }}(n - k_{\mathrm L} + 1)}}\lambda _{ij}^{k_{\mathrm L}}(1 - \lambda _{ij})^{n - k_{\mathrm L}},$$

(1)

where n = k_i $_{\rightarrow}$ _j + k_j $_{\rightarrow}$ _i + k_i ~ j + k_G + k_U > 0 and Γ is the Gamma function, with maximum likelihood estimate $\hat \lambda _{ij} = k_{\mathrm L}/n$. Evidence for ruling out direct transmission (μ_ij) was based on k_U and total n as above. Evidence for the direction of transmission given linkage (δ_ij) was based on k_i $_{\rightarrow}$ _j and total k_i $_{\rightarrow}$ _j + k_j $_{\rightarrow}$ _i. Posterior density estimates of λ_ij, μ_ij and δ_ij are available analytically when a Beta prior density on these parameter is chosen. We here chose a flat Beta prior density with scale and shape parameters set to 1, so that e.g. the posterior density for direct transmission is

$$p\left( {\lambda _{ij}{\mathrm{|}}k_{\mathrm L},n} \right) = \frac{{{\mathrm{\Gamma }}(n + 1)}}{{{\mathrm{\Gamma }}(k_{\mathrm L} + 1){\mathrm{\Gamma }}(n - k_{\mathrm L} + 1)}}\lambda _{ij}^{k_{\mathrm L}}(1 - \lambda _{ij})^{n - k_{\mathrm L}}.$$

(2)

The confidence intervals shown in Supplementary Notes 2 and 4 are 95% highest density intervals of Eq. (2). In principle, the parameters of the Beta prior could be chosen to reflect additional data such as seroconversion histories; however, care should be taken to specify informative priors based on variables such as age differences or age-specific disease prevalence²⁰, in order to avoid circular inferences on who may have infected whom.

Most likely transmission chains

Pairs of individuals between whom transmission was not excluded (when $\hat \mu _{ij} > 0.6$) defined a set of connected graphs, which we call (partially observed) transmission networks. For each network, we defined its adjacency matrix with entries $\hat \tau _{ij} = k_{i \to j} + k_{i\sim j}/2$ for i ≠ j and $\hat \tau _{ij} = 0$. Every spanning tree c of a network defines a possible transmission chain, and was associated with a transmission flow score over its directed edges, $\hat \tau _c = \mathop {\prod }\nolimits_{ij \in c} \hat \tau _{ij}$. The most likely transmission chain, defined by $\hat c^{{\mathrm {ML}}} = {\mathrm{argmax}}_c\,\hat \tau _c$, was calculated with Edmonds’s algorithm as implemented in the RBGL R package, version 1.55.1 ⁵⁶.

Classification of linked pairs and sources

Pairs in most likely transmission chains were classified as (epidemiologically) linked when $\hat \lambda _{ij} = k_{\mathrm L}/n > c$ where n as above and c = 0.6, and otherwise as potentially linked. The threshold c was determined as follows. Under model (1), k_L ~ Binomial (n, λ_ij), where λ_ij indicates the strength of phylogenetic evidence for linkage. The threshold c was motivated by the condition that the posterior probability for λ_ij > 50% should be larger than α = 80% or alternatively α = 95%, i.e.

$$p\left( {\lambda _{ij} > 0.5{\mathrm{|}}k_{\mathrm L},n} \right) > \alpha .$$

(3)

We simplified this criterion by choosing c ∈ (0, 1) such that Eq. (3) holds for all k_L > nc for a typical whole-genome analysis. For the Rakai analysis, read alignments had a length of 250 bp, resulting in n = 35 non-overlapping alignments and deep-sequence phylogenies, and so with Eq. (2), we obtain c = 0.57 for α = 80% and c = 0.64 for α = 95%. The thresholds were similar for analyses based on read alignments of length 350 bp, resulting in n = 25 deep-sequence phylogenies, and c = 0.59 for α = 80% and c = 0.67 for α = 95%. This suggested choosing as default values c = 0.6 for α = 80% and c = 0.66 for α = 95%, with the present analysis based on c = 0.6 for all linkage and direction classifications.

Reporting Summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The deep-sequence phylogenies and basic individual-level data analysed during the current study are available in the Dryad repository, https://doi.org/10.5061/dryad.7h46hg2. HIV-1 reads are available on reasonable request through the PANGEA consortium (www.pangea-hiv.org) or the corresponding author. Please contact project manager Lucie Abeler-Dörner (lucie.abeler-dorner@bdi.ox.ac.uk) for further details. Additional individual-level data are available on reasonable request to RHSP or the corresponding author.

Code availability

Code is available from https://github.com/BDI-pathogens/phyloscanner (version 1.1.2) and https://github.com/olli0601/Phyloscanner.R.utilities (version 0.7) under the GNU General Public License v3.0.

References

UNAIDS. UNAIDS Data 2017, Document JC2910E. http://www.unaids.org/en/resources/documents/2017/2017_data_book (2017).
Grabowski, M. K. et al. HIV prevention efforts and incidence of HIV in Uganda. N. Engl. J. Med. 377, 2154–2166 (2017).
Article Google Scholar
UNAIDS. Fast-track: ending the AIDS epidemic by 2030, Document JC2686. http://www.unaids.org/en/resources/documents/2014/JC2686_WAD2014report (2014).
UNAIDS. Empower young women and adolescent girls: fast-track the end of the AIDS epidemic in Africa, Document JC2746. http://www.unaids.org/en/resources/documents/2015/JC2746 (2015).
Salazar-Gonzalez, J. F. et al. Deciphering human immunodeficiency virus type 1 transmission and early envelope diversification by single-genome amplification and sequencing. J. Virol. 82, 3952–3970 (2008).
Article CAS Google Scholar
Maldarelli, F. et al. HIV populations are large and accumulate high genetic diversity in a nonlinear fashion. J. Virol. 87, 10313–10323 (2013).
Article CAS Google Scholar
Dennis, A. M. et al. Phylogenetic studies of transmission dynamics in generalized HIV epidemics: an essential tool where the burden is greatest? J. Acquir. Immune Defic. Syndr. 67, 181–195 (2014).
Article Google Scholar
Pillay, D. et al. PANGEA-HIV: phylogenetics for generalised epidemics in Africa. Lancet Infect. Dis. 15, 259–261 (2015).
Article Google Scholar
Volz, E. et al. HIV-1 transmission during early infection in men who have sex with men: a phylodynamic analysis. PLoS Med. 10, e1001568 (2013).
Article Google Scholar
Stadler, T., Kuhnert, D., Bonhoeffer, S. & Drummond, A. J. Birth-death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). Proc. Natl. Acad. Sci. USA 110, 228–233 (2013).
Article ADS CAS Google Scholar
Grabowski, M. K. et al. The role of viral introductions in sustaining community-based HIV epidemics in rural Uganda: evidence from spatial clustering, phylogenetics, and egocentric transmission models. PLoS Med. 11, e1001610 (2014).
Article Google Scholar
de Oliveira, T. et al. Transmission networks and risk of HIV infection in KwaZulu-Natal, South Africa: a community-wide phylogenetic study. Lancet HIV 4, e41–e50 (2017).
Article Google Scholar
Le, Vu,S. et al. Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases. Epidemics 23, 1–10 (2018).
Article Google Scholar
Barre-Sinoussi, F. et al. Expert consensus statement on the science of HIV in the context of criminal law. J. Int. AIDS Soc. 21, e25161 (2018).
Article Google Scholar
Ratmann, O. et al. Sources of HIV infection among men having sex with men and implications for prevention. Sci. Tr. Med 8, 320ra2 (2016).
Article Google Scholar
Eshleman, S. H. et al. Analysis of genetic linkage of HIV from couples enrolled in the HIV Prevention Trials Network 052 trial. J. Infect. Dis. 204, 1918–1926 (2011).
Article CAS Google Scholar
Campbell, M. S. et al. Viral linkage in HIV-1 seroconverters and their partners in an HIV-1 prevention clinical trial. PLoS ONE 6, e16986 (2011).
Article ADS CAS Google Scholar
Volz, E. M. et al. Molecular epidemiology of HIV-1 subtype B reveals heterogeneous transmission risk: implications for intervention and control. J. Infect. Dis. 217, 1522–1529 (2018).
Article Google Scholar
Didelot, X., Fraser, C., Gardy, J. & Colijn, C. Genomic infectious disease epidemiology in partially sampled and ongoing outbreaks. Mol. Biol. Evol. 34, 997–1007 (2017).
CAS PubMed PubMed Central Google Scholar
Grabowski, M. K. & Lessler, J. Phylogenetic insights into age-disparate partnerships and HIV. Lancet HIV 4, e8–e9 (2017).
Article Google Scholar
Wymant, C. et al. PHYLOSCANNER: inferring transmission from within- and between-host pathogen genetic diversity. Mol. Biol. Evol. 35, 719–733 (2017).
Article Google Scholar
Romero-Severson, E. O., Bulla, I. & Leitner, T. Phylogenetically resolving epidemiologic linkage. Proc. Natl. Acad. Sci. USA 113, 2690–2695 (2016).
Article ADS CAS Google Scholar
Leitner, T. & Romero-Severson, E. Phylogenetic patterns recover known HIV epidemiological relationships and reveal common transmission of multiple variants. Nat. Microbiol. 3, 983–988 (2018).
Article CAS Google Scholar
Serwadda, D. et al. Slim disease: a new disease in Uganda and its association with HTLV-III infection. Lancet 2, 849–852 (1985).
Article CAS Google Scholar
Chang, L. W. et al. Heterogeneity of the HIV epidemic in agrarian, trading, and fishing communities in Rakai, Uganda: an observational epidemiological study. Lancet HIV 3, e388–e396 (2016).
Article Google Scholar
Grabowski, M. K. et al. The validity of self-reported antiretroviral use in persons living with HIV: a population-based study. AIDS 32, 363–369 (2018).
PubMed PubMed Central Google Scholar
Gall, A. et al. Universal amplification, next-generation sequencing, and assembly of HIV-1 genomes. J. Clin. Microbiol. 50, 3838–3844 (2012).
Article CAS Google Scholar
Ratmann, O. et al. HIV-1 full-genome phylogenetics of generalized epidemics in sub-Saharan Africa: impact of missing nucleotide characters in next-generation sequences. AIDS Res. Hum. Retroviruses 33, 1083–1098 (2017).
Article CAS Google Scholar
Rose, R. et al. Identifying transmission clusters with cluster picker and HIV-TRACE. AIDS Res. Hum. Retrovir. 33, 211–218 (2017).
Article CAS Google Scholar
Romero-Severson, E. O. et al. Donor-recipient identification in para- and poly-phyletic trees under alternative HIV-1 transmission hypotheses using approximate Bayesian computation. Genetics 207, 1089–1101 (2017).
Article Google Scholar
Carlson, J. M. et al. HIV transmission. Selection bias at the heterosexual HIV-1 transmission bottleneck. Science 345, 1254031 (2014).
Article Google Scholar
Hue, S. et al. HIV type 1 in a rural coastal town in Kenya shows multiple introductions with many subtypes and much recombination. AIDS Res. Hum. Retrovir. 28, 220–224 (2012).
Article CAS Google Scholar
Novitsky, V. et al. Phylogenetic relatedness of circulating HIV-1C variants in Mochudi, Botswana. PLoS ONE 8, e80589 (2013).
Article ADS Google Scholar
Chan, S. K. et al. Likely female-to-female sexual transmission of HIV–Texas, 2012. Mmwr. Morb. Mortal. Wkly. Rep. 63, 209–212 (2014).
PubMed PubMed Central Google Scholar
Fraser, C. et al. Virulence and pathogenesis of HIV-1 infection: an evolutionary perspective. Science 343, 1243727 (2014).
Article Google Scholar
Hladik, W. et al. Men who have sex with men in Kampala, Uganda: Results from a bio-behavioral respondent driven sampling survey. AIDS Behav. 21, 1478–1490 (2017).
Article Google Scholar
Rose, R. et al. Phylogenetic methods inconsistently predict direction of HIV transmission among heterosexual pairs in the HPTN052 cohort. J. Infect. Dis., https://doi.org/10.1093/infdis/jiy734 (2018).
De Silva, D. et al. Whole-genome sequencing to determine transmission of Neisseria gonorrhoeae: an observational study. Lancet Infect. Dis. 16, 1295–1303 (2016).
Article Google Scholar
Fifer, H. et al. Sustained transmission of high-level azithromycin-resistant Neisseria gonorrhoeae in England: an observational study. Lancet Infect. Dis. 18, 573–581 (2018).
Article Google Scholar
Dellicour, S. et al. Phylodynamic assessment of intervention strategies for the West African Ebola virus outbreak. Nat. Commun. 9, 2222 (2018).
Article ADS Google Scholar
Poon, A. F. et al. Near real-time monitoring of HIV transmission hotspots from routine HIV genotyping: an implementation case study. Lancet HIV 3, e231–e238 (2016).
Article Google Scholar
Oster, A. M., France, A. M. & Mermin, J. Molecular epidemiology and the transformation of HIV prevention. JAMA 319, 1657–1658 (2018).
Article Google Scholar
Skums, P. et al. QUENTIN: reconstruction of disease transmissions from viral quasispecies genomic data. Bioinformatics 34, 163–170 (2018).
Article CAS Google Scholar
Bernard, E.J., Cameron, S., HIV Justice Network & GNP+. Advancing HIV Justice 2: Building momentum in global advocacy against HIV criminalisation. http://www.hivjustice.net/wp-content/uploads/2016/05/AHJ2.final2_.10May2016.pdf (2016).
Yebra, G. et al. Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic. Sci. Rep. 6, 39489 (2016).
Article ADS CAS Google Scholar
Novitsky, V. et al. Long-range HIV genotyping using viral RNA and proviral DNA for analysis of HIV drug resistance and HIV clustering. J. Clin. Microbiol. 53, 2581–2592 (2015).
Article CAS Google Scholar
Bonsall, D. et al. A comprehensive genomics solution for HIV surveillance and clinical monitoring in a global health setting. Preprint at bioRxiv, https://www.biorxiv.org/content/early/2018/08/23/397083 (2018).
Sypsa, V. et al. Rapid decline in HIV incidence among persons who inject drugs during a fast-track combination prevention program after an HIV outbreak in Athens. J. Infect. Dis. 215, 1496–1505 (2017).
PubMed PubMed Central Google Scholar
Chewapreecha, C. et al. Dense genomic sampling identifies highways of pneumococcal recombination. Nat. Genet. 46, 305–309 (2014).
Article CAS Google Scholar
Paterson, G. K. et al. Capturing the cloud of diversity reveals complexity and heterogeneity of MRSA carriage, infection and transmission. Nat. Commun. 6, 6560 (2015).
Article CAS Google Scholar
Wymant, C. et al. Easy and accurate reconstruction of whole HIV genomes from short-read sequence data. Virus Evol. 4, vey007 (2018).
Article Google Scholar
Hunt, M. et al. IVA: accurate de novo assembly of RNA virus genomes. Bioinformatics 31, 2374–2376 (2015).
Article CAS Google Scholar
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Article MathSciNet CAS Google Scholar
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).
Article CAS Google Scholar
Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).
Article Google Scholar
Carey, V., Long, L. & Gentleman, R. RBGL: an interface to the BOOST graph library, version 1.55.1. http://bioconductor.org/packages/release/bioc/html/RBGL.html (2017).

Download references

Acknowledgements

We thank the participants of the RHSP RCCS; as well as the PANGEA-HIV steering committee for their input and their comments on a previous version of this article. Computations were performed at the Imperial College Research Computing Service, https://doi.org/10.14469/hpc/2232. This study was supported by the National Institute of Mental Health (K23MH086338, R01MH107275); the National Institute of Allergy and Infectious Diseases (R01AI110324, U01AI100031, R01AI110324, R01AI102939); the National Institute of Child Health and Development (RO1HD070769, R01HD050180); the Division of Intramural Research, National Institute for Allergy and Infectious Diseases, National Institutes of Health; the Bill & Melinda Gates Foundation (22006.02, OPP1084362); the Johns Hopkins University Center for AIDS Research (P30AI094189); and the European Research Council (Advanced Grant PBDR-339251).

Author information

Authors and Affiliations

Department of Mathematics, Imperial College London, London, SW72AZ, UK
Oliver Ratmann
Department of Infectious Disease, Epidemiology School of Public Health, Imperial College London, London, W21PG, UK
Oliver Ratmann & Chris Wymant
Department of Medicine, Johns Hopkins School of Medicine, Baltimore, MD, 21205-2196, USA
M. Kate Grabowski, Thomas C. Quinn, Oliver Laeyendecker, Ronald H. Gray, Steven J. Reynolds, Larry W. Chang & Andrew D. Redd
Rakai Health Sciences Program, Entebbe, P.O.Box 49, Uganda
M. Kate Grabowski, Joseph Kagaayi, Godfrey Kigozi, Maria J. Wawer, David Serwadda, Ronald H. Gray, Gertrude Nakigozi, Robert Ssekubugu, Fred Nalugoda, Tom Lutalo, Ronald Galiwango, Fred Makumbi, Nelson K. Sewankambo, Aaron A. R. Tobian, Steven J. Reynolds, Larry W. Chang, Dorean Nabukalu, Anthony Ndyanabo, Joseph Ssekasanvu, Hadijja Nakawooya, Jessica Nakukumba, Grace N. Kigozi, Betty S. Nantume, Nampijja Resty, Jedidah Kambasu, Margaret Nalugemwa, Regina Nakabuye, Lawrence Ssebanobe, Justine Nankinga, Adrian Kayiira, Gorreth Nanfuka, Ruth Ahimbisibwe, Stephen Tomusange, Ronald M. Galiwango, Sarah Kalibbali, Margaret Nakalanzi, Joseph Ouma Otobi, Denis Ankunda, Joseph Lister Ssembatya, John Baptist Ssemanda, Robert Kairania, Emmanuel Kato, Alice Kisakye, James Batte, James Ludigo, Abisagi Nampijja, Steven Watya, Kighoma Nehemia, Margaret Anyokot Sr., Joshua Mwinike, George Kibumba, Paschal Ssebowa, George Mondo, Francis Wasswa, Agnes Nantongo, Rebecca Kakembo, Josephine Galiwango, Geoffrey Ssemango, Andrew D. Redd, John Santelli, Caitlin E. Kennedy & Jennifer Wagman
Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, Old Road Campus, University of Oxford, Oxford, OX3 7BN, UK
Matthew Hall, Tanya Golubchik, Chris Wymant, Lucie Abeler-Dörner, David Bonsall & Christophe Fraser
Division of Infection and Immunity, University College London, London, WC1E 6BT, UK
Anne Hoppe, Deenan Pillay & Daniel Frampton
School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3FF, UK
Andrew Leigh Brown & Andrew Rambaut
College of Health Sciences, University of KwaZulu-Natal, Durban, 4041, South Africa
Tulio de Oliveira
European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
Astrid Gall
Department of Medicine, Imperial College London, London, W12 0HS, UK
Paul Kellam & Sarah Fidler
Africa Health Research Institute, Private Bag X7, Durban, 4013, South Africa
Deenan Pillay & Frank Tanser
Division of Intramural Research, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD, 20892-9806, USA
Thomas C. Quinn & Oliver Laeyendecker
Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
Maria J. Wawer, Ronald H. Gray & Joseph Ssekasanvu
Makerere University School of Public Health, Kampala, 8HQG+3V, Uganda
David Serwadda
Zambart Project, Lusaka, P.O. Box 50697, Zambia
Helen Ayles
Oxford Genomics Centre, The Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
Rory Bowden
Ecole Normale Supérieure de Lyon, Lyon, 69007, France
Vincent Calvez
Department of Medicine, University of North Carolina, Chapel Hill, NC, 27516, USA
Myron Cohen & Ann Dennis
Harvard T.H. Chan School of Public Health AIDS Initiative, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
Max Essex
Botswana Harvard AIDS Institute Partnership, Gaborone, Private Bag BO 320, Botswana
Max Essex
London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK
Richard Hayes & Janet Seeley
Department of Global Health, University of Washington, Seattle, WA, 98104, USA
Joshua T. Herbeck & Jairam Lingappa
MRC/UVRI, Entebbe, P.O.Box 49, Uganda
Pontiano Kaleebu & Deogratius Ssemwanga
Joint Clinical Research Centre, Kampala, P.o.Box 10005, Uganda
Cissy Kityo
Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
Vladimir Novitsky
Medical Research Council, London, WC2B 4AN, UK
Nick Paton
Mailman School of Public Health, Columbia University, New York, NY, 10032, USA
John Santelli
School of Medicine, University of California San Diego, San Diego, CA, 92093, USA
Jennifer Wagman

Authors

Oliver Ratmann
View author publications
You can also search for this author in PubMed Google Scholar
M. Kate Grabowski
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Hall
View author publications
You can also search for this author in PubMed Google Scholar
Tanya Golubchik
View author publications
You can also search for this author in PubMed Google Scholar
Chris Wymant
View author publications
You can also search for this author in PubMed Google Scholar
Lucie Abeler-Dörner
View author publications
You can also search for this author in PubMed Google Scholar
David Bonsall
View author publications
You can also search for this author in PubMed Google Scholar
Anne Hoppe
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Leigh Brown
View author publications
You can also search for this author in PubMed Google Scholar
Tulio de Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Astrid Gall
View author publications
You can also search for this author in PubMed Google Scholar
Paul Kellam
View author publications
You can also search for this author in PubMed Google Scholar
Deenan Pillay
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Kagaayi
View author publications
You can also search for this author in PubMed Google Scholar
Godfrey Kigozi
View author publications
You can also search for this author in PubMed Google Scholar
Thomas C. Quinn
View author publications
You can also search for this author in PubMed Google Scholar
Maria J. Wawer
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Laeyendecker
View author publications
You can also search for this author in PubMed Google Scholar
David Serwadda
View author publications
You can also search for this author in PubMed Google Scholar
Ronald H. Gray
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Fraser
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

PANGEA Consortium and Rakai Health Sciences Program

Helen Ayles
, Rory Bowden
, Vincent Calvez
, Myron Cohen
, Ann Dennis
, Max Essex
, Sarah Fidler
, Daniel Frampton
, Richard Hayes
, Joshua T. Herbeck
, Pontiano Kaleebu
, Cissy Kityo
, Jairam Lingappa
, Vladimir Novitsky
, Nick Paton
, Andrew Rambaut
, Janet Seeley
, Deogratius Ssemwanga
, Frank Tanser
, Gertrude Nakigozi
, Robert Ssekubugu
, Fred Nalugoda
, Tom Lutalo
, Ronald Galiwango
, Fred Makumbi
, Nelson K. Sewankambo
, Aaron A. R. Tobian
, Steven J. Reynolds
, Larry W. Chang
, Dorean Nabukalu
, Anthony Ndyanabo
, Joseph Ssekasanvu
, Hadijja Nakawooya
, Jessica Nakukumba
, Grace N. Kigozi
, Betty S. Nantume
, Nampijja Resty
, Jedidah Kambasu
, Margaret Nalugemwa
, Regina Nakabuye
, Lawrence Ssebanobe
, Justine Nankinga
, Adrian Kayiira
, Gorreth Nanfuka
, Ruth Ahimbisibwe
, Stephen Tomusange
, Ronald M. Galiwango
, Sarah Kalibbali
, Margaret Nakalanzi
, Joseph Ouma Otobi
, Denis Ankunda
, Joseph Lister Ssembatya
, John Baptist Ssemanda
, Robert Kairania
, Emmanuel Kato
, Alice Kisakye
, James Batte
, James Ludigo
, Abisagi Nampijja
, Steven Watya
, Kighoma Nehemia
, Margaret Anyokot Sr.
, Joshua Mwinike
, George Kibumba
, Paschal Ssebowa
, George Mondo
, Francis Wasswa
, Agnes Nantongo
, Rebecca Kakembo
, Josephine Galiwango
, Geoffrey Ssemango
, Andrew D. Redd
, John Santelli
, Caitlin E. Kennedy
& Jennifer Wagman

Contributions

O.R., M.K.G., A.L.B., TdO, P.K., D.P., T.C.Q., M.J.W., D.S., R.H.G., C.F. conceived the study; M.K.G., J.K., G.K., O.L., T.C.Q., M.J.W., D.S., R.H.G., A.G., D.B. selected, provided and prepared sequence and patient data; L.A.-D., A.H., T.G. provided managerial and logistical support, including data tracking; M.K.G., C.W., T.G. assembled deep-sequence reads; O.R., M.H. performed computations and statistical analyses; O.R., M.K.G., C.F. evaluated statistical analyses; O.R. wrote the first version of the manuscript; all authors reviewed and approved the statistical analysis and final version of the manuscript.

Corresponding author

Correspondence to Oliver Ratmann.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Journal peer review information: Nature Communications thanks Denise Kühnert, Thomas Leitner, and the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

^# A full list of consortium members appears at the end of the paper.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ratmann, O., Grabowski, M.K., Hall, M. et al. Inferring HIV-1 transmission networks and sources of epidemic spread in Africa with deep-sequence phylogenetic analysis. Nat Commun 10, 1411 (2019). https://doi.org/10.1038/s41467-019-09139-4

Download citation

Received: 05 October 2018
Accepted: 22 February 2019
Published: 29 March 2019
DOI: https://doi.org/10.1038/s41467-019-09139-4

This article is cited by

Molecular epidemiology of the HIV-1 epidemic in Fiji
- Atlesh Sudhakar
- Donald Wilson
- Jemma L. Geoghegan
npj Viruses (2024)
Longitudinal population-level HIV epidemiologic and genomic surveillance highlights growing gender disparity of HIV transmission in Uganda
- Mélodie Monod
- Andrea Brizzi
- Oliver Ratmann
Nature Microbiology (2023)
Empirical comparison of analytical approaches for identifying molecular HIV-1 clusters
- Vlad Novitsky
- Jon A. Steingrimsson
- Rami Kantor
Scientific Reports (2020)
Prediction of COVID-19 spread by sliding mSEIR observer
- Duxin Chen
- Yifan Yang
- Wenwu Yu
Science China Information Sciences (2020)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Large deep-sequence data set of an African HIV-1 epidemic

Scaling deep-sequence phylogenetics to large data sets

Viral deep-sequence data cannot prove HIV-1 transmission

The direction of transmission can be frequently inferred

Inferring the direction of transmission has a small error

Discussion

Methods

Sample selection

Ethics

Sampling fraction

HIV-1 deep-sequencing

Assembly of HIV-1 reads

Read selection

Deep-sequence phylogenetic analysis

Phylogenetic relationships of virus from two individuals

Evidence for transmission and direction of transmission

Most likely transmission chains

Classification of linked pairs and sources

Reporting Summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

PANGEA Consortium and Rakai Health Sciences Program

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links