Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

HIV-1 Transmission Patterns Within and Between Risk Groups in Coastal Kenya


HIV-1 transmission patterns within and between populations at different risk of HIV-1 acquisition in Kenya are not well understood. We investigated HIV-1 transmission networks in men who have sex with men (MSM), injecting drug users (IDU), female sex workers (FSW) and heterosexuals (HET) in coastal Kenya. We used maximum-likelihood and Bayesian phylogenetics to analyse new (N = 163) and previously published (N = 495) HIV-1 polymerase sequences collected during 2005–2019. Of the 658 sequences, 131 (20%) were from MSM, 58 (9%) IDU, 109 (17%) FSW, and 360 (55%) HET. Overall, 206 (31%) sequences formed 61 clusters. Most clusters (85%) consisted of sequences from the same risk group, suggesting frequent within-group transmission. The remaining clusters were mixed between HET/MSM (7%), HET/FSW (5%), and MSM/FSW (3%) sequences. One large IDU-exclusive cluster was found, indicating an independent sub-epidemic among this group. Phylodynamic analysis of this cluster revealed a steady increase in HIV-1 infections among IDU since the estimated origin of the cluster in 1987. Our results suggest mixing between high-risk groups and heterosexual populations and could be relevant for the development of targeted HIV-1 prevention programmes in coastal Kenya.


Approximately 5.6% in the population in Kenya are infected by HIV-1, with a more than three-fold higher HIV-1 prevalence among so-called high-risk groups – including men who have sex with men (MSM), injecting drug users (IDU) and female sex workers1,2,3,4. Modelling data on modes of HIV-1 transmission in Kenya have shown that at least one-third of all new infections occur among high-risk groups5. However, little is known about local HIV-1 networks and transmission within and between high-risk groups and the heterosexual (HET) population in African settings6. Molecular epidemiology studies in coastal Kenya have described a dynamic HIV-1 epidemic characterised by subtypes A, C, D, and different circulating recombinant forms (CRFs)6,7,8,9,10,11,12,13,14,15. These studies have indicated high proportions of recombinants within HET, but this was not evident among MSM in a recent study8, alluding to separate epidemics. One study in coastal Kenya observed similar HIV-1 recombination patterns among HIV-1 strains in MSM and FSW, suggesting reinfections within mixed networks13.

HIV-1 transmission dynamics can be assessed by linking socio-demographic, clinical and behavioural data with HIV-1 sequence data by phylogenetics16,17. With few exceptions, most phylogenetic studies of the HIV-1 epidemic in sub-Saharan Africa have focused on understanding HIV-1 subtype diversity and prevalence of antiretroviral resistance mutations18,19,20,21,22,23,24. Phylogenetic studies highlighting the dynamics of HIV-1 transmission and contribution of high-risk groups to onward viral transmission are common in North America and Europe, where largescale HIV-1 sequence data are available25,26,27,28,29,30,31,32. Due to the limited availability of HIV-1 sequences from sub-Saharan Africa, only a few phylogenetic studies have assessed the dynamics of the HIV-1 epidemic in the region33,34,35. Transmission networks studies in Kenya have demonstrated clustering of MSM sequences with evidence of transmission between different geographical regions, and limited mixing between MSM and HET6,8,13. We have also demonstrated extensive clustering of HIV-1 pol sequences from MSM who have sex with only men and MSM who have sex with both men and women in coastal Kenya8. Given that many MSM on the coast of Kenya have sex with both men and women, there is a possibility of HIV-1 transmission linkages between MSM and the local HET community4. The primary objective of the current study was to investigate transmission networks within and between MSM, IDU, FSW, and HET in coastal Kenya using both newly generated and previously published HIV-1 pol sequences. A secondary objective was to determine HIV-1 genetic diversity among different risk groups in coastal Kenya.


Study Population, sequence dataset and sampling density

The analysed 658 coastal Kenyan HIV-1 partial pol sequences included both newly generated (N = 163) and previously published sequences (N = 495). Sequences were collected during 2005–2019 in the Mombasa (N = 210, 32%) and Kilifi (N = 448, 68%) counties in coastal Kenya (Table 1). The risk groups included MSM (N = 131, 20%), IDU (N = 58, 9%), FSW (N = 109, 17%), and HET (N = 360, 55%). Based on size estimation of risk groups and the number of infected populations infected with HIV-1 in Mombasa and Kilifi counties36, our study was more powered to pick out MSM (sampling density, 51%) and IDU (sampling density, 12%) clusters compared with FSW (sampling density 3%) and HET (sampling density, 1%) clusters (Supplementary Table S1).

Table 1 Demographics and distribution of newly generated and published coastal Kenya HIV-1 pol sequences by risk-group.

HIV-1 subtypes A, C, and D dominated the epidemic in coastal Kenya

Phylogenetic analysis was used to determine the subtype distribution in the full coastal Kenya sequence dataset (N = 658). In total, 431 subtype A (66%), 46 subtype C (7%), 69 subtype D (10%), 2 subtype G (>1%), and 110 CRF and unique recombinant form (URFs, 17%) sequences were identified (Table 1 and Supplementary Fig. S1). In addition, all subtype A sequences belonged to sub-subtype A1. Detailed recombination analyses of newly generated sequences demonstrated extensive recombination between subtypes, sub-subtypes, and recombinant forms (Supplementary Table S2).

Identification of coastal Kenya-specific HIV-1 transmission clusters

Maximum-likelihood (ML) phylogenies were reconstructed independently for the most prevalent HIV-1 subtypes in the population (subtypes A, C, and D). Reference sequences were obtained from GenBank based on similarity (whereof 731 participant-unique sub-subtype A1 sequences remained after removal of redundancies; 256 for subtype C; and 92 for subtype D).

Transmission networks were classified based on the number of sequences per cluster into dyads (2 sequences), networks (3–14 sequences), and large clusters (>14). Of the 658 coastal Kenyan sequences, 206 sequences (31%) formed 61 statistically supported clusters (size range: 2–41 sequences). These included 39 dyads (64% of all clusters), 21 networks (34%), and one large cluster (2%) (Table 2 and Supplementary Table S3). Most of the clusters were found among the subtype A sequences (N = 50, 82%), followed by the subtype D sequences (N = 7, 11%), and the subtype C sequences (N = 4, 7%) (Supplementary Fig. S2).

Table 2 The number of coastal Kenyan transmission clusters by cluster size and risk group.

Risk-group specific clustering patterns

Stratification by risk group showed two distinct clustering patterns (Fig. 1). The first pattern represented exclusive within-risk group clustering, where sequences in a cluster belonged exclusively to one specific risk group. Compared to HET sequences, MSM and IDU sequences were more likely to cluster (adjusted odds ratio [aOR] 25.9, 95% confidence interval [CI] 10–63.9, P < 0.001; and aOR 31.5, CI 12.2–81.6, P < 0.001, respectively, Table 3). Of the 61 clusters observed, 85% were risk group exclusive clusters. These included 24 MSM clusters (11 dyads and 13 networks), four IDU clusters (three dyads and one large cluster), six FSW clusters (six dyads), and 18 HET clusters (13 dyads and five networks). The majority of the MSM sequences (N = 84, 64%) formed small independent clusters ranging in size from two to nine sequences per cluster. Likewise, the majority of IDU sequences (N = 47, 82%) formed clusters. Interestingly, most of the clustering IDU sequences were of sub-subtype A1 and formed one single large cluster (N = 41, 80%). In contrast, only a small proportion of the FSW (N = 22, 20%) and HET (N = 67, 18%) sequences formed risk group-exclusive clusters.

Figure 1

Clustering patterns of different risk groups in coastal Kenya. Representative clusters selected to highlight typical clustering patterns of the different risk groups. The branches are coloured according to the different risk groups (Bluish- green: MSM; Sky blue: IDU; Vermillion: FSW; Yellow: HET, and Black: reference sequences). As an overview, MSM formed several small clusters ranging in size from two to nine sequences per cluster (A). Most IDU sequences (N = 41) formed one single large cluster (B). In contrast, FSW and HET clusters were small (mostly dyads containing two sequences), although most FSW and HET sequences existed as single sequences or clustered with reference sequences (C). Asterisks have been used to highlight branches leading to significantly supported clusters (aLRT-SH branch support of ≥0.9).

Table 3 Factors associated with clustering among 546 subtypes A1, C, and D HIV-1 sequences from MSM, IDU, FSW, and HET individuals from coastal Kenya.

In addition to risk group-exclusive clustering, 15% of all clusters were mixed between risk-groups (Table 2): Two (3%) MSM/FSW dyads, four (7%) MSM/HET mixed clusters (two dyads and two networks), and three (5%) FSW/HET clusters (two dyads and one network). Of relevance, both MSM/HET networks (one sub-subtype A1 and one subtype D) had sequences from four MSM and one HET female. Moreover, of the eleven MSM sequences that were found in mixed clusters, eight (73%) reported sex work in the three months preceding sample collection, and 10 (91%) reported bisexual behaviour. The FSW/HET network consisted of two FSW sequences and one HET male sequence.

Most sequences from coastal Kenya clustered exclusively with sequences of Kenyan origin. Only three clusters with sequences from coastal Kenya had links to published sequences from outside coastal Kenya. One sequence in an MSM cluster was from an MSM from Nairobi. One sequence in another MSM cluster was from a German MSM, and the last sequence was found in a mixed MSM/HET cluster and was from a Canadian individual of unknown gender (Supplementary Table S3).

Genetic diversity between clusters of different risk groups

To further dissect differences in clustering patterns between risk groups, we determined the average genetic diversity in the identified clusters. A previously described ML bootstrap approach was employed to determine the genetic diversity (based on 1000 ML bootstrap trees)37. The median genetic diversity was 0.009 substitutions/site (s/s, IQR: 0.005–0.017 s/s) for MSM clusters, 0.03 s/s (IQR: 0.02–0.055 s/s) for IDU clusters, 0.008 s/s (IQR: 1×10–8–0.018 s/s) for FSW clusters, 0.015 s/s (IQR: 0.006–0.023 s/s) for HET clusters, and 0.018 s/s (IQR: 0.013–0.024 s/s) for mixed clusters. A Kruskal-Wallis H test showed that the distribution of genetic diversity differed across the five groups (χ2 = 11.074, four degrees of freedom, P-value = 0.026). A Dunn’s post hoc test for multiple comparisons using rank sums showed a significant difference in diversity between FSW and IDU (mean rank difference = 33.08, adjusted P-value = 0.039, Fig. 2, Supplementary Table S4).

Figure 2

Genetic diversity of different risk group-specific clusters in coastal Kenya. A pirate plot63 illustrating the differences in genetic diversity between MSM, IDU, FSW, HET and Mixed clusters. Black dots represent the median estimates of the genetic diversity per cluster. The group median and the interquartile range diversity estimates are indicated in box plots coloured by risk group (Bluish-green: MSM; Sky blue: IDU; Vermillion: FSW; Yellow: HET; Deep blue: Mixed risk groups).

Analysis of active transmission clusters

Among the 61 clusters defined by risk group, we identified 34 potentially active clusters as (determined by low genetic distance <1.5%), suggesting ongoing transmission at the time of sample collection. Stratification of the potentially active clusters by risk-group showed: Eight MSM dyads and five MSM networks; four IDU dyads; five FSW dyads; and seven HET dyads and two HET networks. Potentially active clusters with sequences from different risk groups included one FSW/HET network and two MSM/HET networks (Supplementary Table S5).

Estimation of time to the most recent common ancestor (tMRCA) and evolutionary rates

To gain insight into the evolutionary dynamics of the identified transmission clusters, we determined the evolutionary rate and the tMRCA of the clusters by Bayesian phylogenetic analysis. The median tMRCA of the 61 coastal Kenya clusters indicated that HIV-1 has been introduced in coastal Kenya several times over a period of 27 years (1985–2012, Supplementary Fig. S3 and Table S3). Only one cluster was large enough to allow for in-depth phylodynamic analysis. This cluster comprised 41 IDU sequences and the tMRCA for this cluster was determined to be 1987 (95% higher posterior density [HPD] interval: 1985–1990) (Fig. 3) with a median evolutionary rate of 6.4 × 10−3 substitutions/site/year (HPD interval: 3.9 × 10−3 − 1.1 × 10−2). The Skygrid analysis indicated that the number of IDU contributing to new HIV-1 infections over time increased gradually from 1987 to 2010.

Figure 3

Population dynamics of the HIV-1 sub-epidemic among injecting drug users in coastal Kenya. A Bayesian Skygrid plot showing population dynamics of the HIV-1 sub-subtype A1 injecting drug users’ sub-epidemic in coastal Kenya. Since the IDU pol sequence alignment did not contain temporal information (all sequences were sampled in 2010), the node height for this cluster was calibrated using information from the tMRCA posterior distribution obtained from dating the origin of subtype A1 Kenyan clusters64. Median estimates of the number of injecting drug users contributing to new infections are shown as a continuous black line. The shaded area represents the 95% higher posterior density intervals of the inferred effective population size.


In this study, we found several HIV-1 links between MSM/HET, HET/FSW, and MSM/FSW, indicating mixing between these risk groups in coastal Kenya. Sequences from HET females in clusters dominated by MSM sequences provided evidence for heterosexual linkages in these clusters. The majority of the MSM in mixed clusters also reported having female sexual partners, indicating that this group, in addition to female sex workers, could be an important transmission link to the HET epidemic4.

Transmission linkages between MSM and HET in coastal Kenya might be expected to some extent, given that sexual interaction between MSM and other risk groups in the community is common2,3,4,6,13. In the only previous study of phylogenetic HIV-1 transmission linkages between MSM and HET in coastal Kenya, Bezemer and colleagues only found one single transmission pair of an MSM and a known female partner. Hence, extensive mixing between the MSM and HET epidemics was not found in that study6. In comparison, our analysis included a higher sampling density and availability of risk-group annotated sequences obtained in more recent years. This likely explains the significantly higher number of mixed clusters between MSM and HET sequences in the current study. In a broader context, our study complements existing research on the role of mixed networks in HIV-1 transmission – both globally and in sub-Saharan Africa26,30,31,38,39,40.

Although we found several instances of mixed clusters, the majority of the coastal Kenya clusters represented within-risk-group HIV-1 transmission. MSM-exclusive and IDU-exclusive clusters were more common than FSW and HET clusters. High rates of clustering among MSM and IDU have been described before, both in our setting and elsewhere, and have been linked to an elevated risk of infection among MSM and IDU within close networks6,8,17,29,30,41. Whereas the MSM sequences were found in several smaller clusters, the vast majority of the IDU sequences formed one large cluster. This suggests that the majority of the HIV-1 IDU epidemic in Coastal Kenya was introduced from one single source followed by a long-term gradual spread within the IDU population – a pattern that distinguishes IDU transmission from that of other risk groups in coastal Kenya. In contrast to previous studies where IDU sequences clustered with very low genetic diversity29,30,42, IDU clusters in our analysis had the highest genetic diversity compared to the other analysed risk groups. This indicates that the elapsed time between HIV-1 infection and sample collection may have been longer among IDUs, and/or that the proportion of collected IDU sequences in relation to the true number of IDUs was lower compared to other risk groups. The underlying reasons for this are unknown and warrant further investigation. However, with or without missing links, such clustering pattern is indicative of long-standing HIV-1 transmission linked with intravenous drug use in coastal Kenya17.

This is the first study in Africa to investigate the evolutionary dynamics of an HIV-1 sub-epidemic among IDU. The phylodynamic analysis indicated a steady increase in the epidemic among IDU in coastal Kenya from 1987 to 2010, with no evidence of a rapid exponential increase in the number infections that is typical of HIV-1 outbreaks among IDUs29,30,42. Still, the gradual increase in infections among IDU is compatible with a known period of increased injection of heroin in the region43. Interestingly, and given a general absence of epidemiological surveillance data, new infections among IDU did not seem to decrease with the national rollout of combination antiretroviral therapy (ART) in 2004. This could be a consequence of the unfavourable climate of stigma, discrimination and hostile legislation associated with IDU and most-at-risk-populations in Kenya, which impedes these populations from accessing medical services including ART1,44. Opioid substitution therapy for IDU, Pre-Exposure Prophylaxis (PrEP) targeting all high-risk groups, and initiation of ART immediately upon diagnosis have all been introduced and scaled up after 2010, when the IDU samples used for this study were collected45,46. As new sequence data are made available, future studies may shed light on the effectiveness of these strategies.

This study had some limitations: First, the identified transmission chains are likely to suffer from missing links due to low sampling density. A low sampling density generally results in reduced clustering of HIV-1 sequences47. Because majority of the studies in the coastal Kenya setting have mostly focused on recruiting MSM participants, FSW and HET in our analysis had low sampling densities and inevitably, several transmission chains may have been missed. Furthermore, it is important to acknowledge that the IDU sequences were from one study, using samples from one setting (Mombasa), and collected over a period of less than one year (2010). It is therefore likely that our findings are not representative of the entire IDU epidemic in coastal Kenya. Larger studies of HIV-1 transmission in the IDU population in coastal Kenya are therefore warranted; still we found clear branching patterns indicative of long-standing HIV-1 transmission associated with intravenous drug use. Second, skewed sampling between risk groups may result in overrepresentation of some types of risk group-specific and mixed clusters. Third, given that annotating sequences from sub-Saharan Africa with information about transmission risk factor has become common only in recent years, some published sequences used in this analysis lacked risk data and were assigned HET (by far the most dominant route of HIV-1 transmission in coastal Kenya). However, the risk group for nodes within a cluster that had inadequate annotation can often be deduced from association with nodes with a known risk group48. Since none of the presumed heterosexual sequences in this study was found in mixed clusters, it is unlikely that this potential caveat had any effect on our conclusions. Finally, few HIV-1 pol sequences were available after the year 2010 for all risk groups. This limited our analysis to the representation of some risk groups by the year and area of sampling, further restraining characterisation of recent clusters and ongoing transmission chains.

In conclusion, we demonstrated that there is HIV-1 mixing between high-risk groups and heterosexual populations in coastal Kenya, with frequent within-risk-group transmission. We highlight that high-risk groups could contribute to the epidemic either through seeding and maintaining new infections within their own risk group or through linking infections across different risk groups. It is possible that HIV-1 prevention programmes targeting FSW, MSM and IDU populations could reduce overall HIV-1 transmission in coastal Kenya. As more sequences become available from wider geographic regions, further and larger studies with uniform sampling densities across different risk groups will be necessary to estimate the impact of mixing between risk groups and the general population on HIV-1 spread and to determine the source populations that could most effectively be targeted to mitigate new infections in sub-Saharan Africa.


Study population and sequence dataset

All published HIV-1 pol sequences available in the HIV database at the Los Alamos National Laboratory (LANL) originating from coastal Kenya and collected 2005–2019 retrieved49. Sequences sampled from the same individuals were excluded from the data set, retaining only the oldest sequence per participant. Risk group information was obtained from LANL and any missing data were obtained by communication with study authors or inferred from reviewing literature from the respective studies6,8,9,10,11,12,13,14,15. In addition, new HIV-1 pol sequences were generated from samples collected 2005–2019 participants in an acute HIV-1 infection cohort and from a prospective observational study following high-risk volunteers in an HIV-1 vaccine feasibility cohort described elsewhere3. All new sequences were collected from treatment-naïve men and women aged 18 years and over. HIV-1 diagnosis for samples collected before 2016 was done using two rapid antibody tests in parallel (Determine, Abbott Laboratories; Unigold, Trinity Biotech), with conflicting results resolved by an enzyme-linked immunosorbent assay (ELISA, Genetic System HIV-1/2 plus O EIA; Bio-Rad). HIV-1 diagnosis for samples collected after 2016 was made using the GeneXpert HIV-1 Qual (Cepheid, Sunnyvale, CA, USA).

Sequences were annotated by date of sample collection, geographical area (Mombasa or Kilifi county), and by risk group. Sequences were classified into: MSM (men who reported having sex with men); IDU (individuals who reported injecting drugs with a needle and syringe); FSW (females who reported ever receiving money, gifts, or favours in exchange for sex); and HET (all other individuals [both male and female] who did not report engaging in sex work, male same-sex behaviour or injection drug use).

RNA extraction, amplification of HIV-1 pol region and sequencing

HIV-1 RNA was extracted from blood plasma samples using the RNeasy Lipid Tissue Mini Kit (QIAGEN) with modifications from the manufacturer’s standard protocol50. Briefly, 100 µl of blood plasma was efficiently lysed in 1000 µl Qiazol Lysis Reagent (Qiagen). DNA was removed by treating the column with RNase-free DNase 1 (Qiagen) prior to RNA elution in 40 µl nuclease-free water. Reverse transcription and amplification of partial pol gene were performed using the One-Step Superscript III RT/Platinum Taq High Fidelity Enzyme Mix (ThermoFisher ScientificTM) with the pol-specific primer pair JA269 and JA27251. First-round PCR products were amplified in a nested PCR with DreamTaq Green DNA Polymerase (ThermoFisher ScientificTM) using pol-specific primers JA271 and JA270. PCR products were sequenced in both directions with the nested PCR primers using the BigDye terminator kit v1.1 (Applied Biosystems) and the sequences were determined on an ABI PRISM 3130×1 Genetic Analyzer (Applied Biosystems).

Sampling density

County-exclusive estimates for HIV-1 prevalence for high-risk groups in Kenya were not available when this analysis was done. Hence, the national HIV-1 prevalence estimate for each risk group and the estimates of people infected with HIV-1 in Mombasa and Kilifi counties were used to estimate the sampling density of our study (defined as the proportion of genotyped viral sequences in the estimated number of HIV-infected individuals in a risk group)36.

HIV-1 Subtyping

Newly generated consensus and published pol sequences from coastal Kenya were combined, codon-aligned using ClustalX 2.0.11, and manually adjusted in Geneious Prime 2019 (Version 2019 2.1) ( The combined sequences were then aligned with the Group M (subtypes A-K + Recombinants) HIV-1 subtype reference dataset downloaded from Los Alamos HIV Database (

To infer genetic relatedness, phylogenetic reconstruction was done in PhyML using the general time-reversible substitution model with a gamma-distributed rate variation and proportion of invariant sites (GTR + Γ4 + Ι)53. Branch support was estimated using the Shimodaira-Hasegawa approximate likelihood ratio test (aLRT-SH) as implemented in PhyML54. An aLRT-SH ≥0.9 was considered statistically significant17. The subtype-resolved maximum-likelihood phylogenetic tree was visualized in FigTree (v1.4.3). Unique recombinant forms (URFs) were further resolved by Bootscan analyses using SimPlot and breakpoints identified using a sliding window size of 300 bp and a step size of 20 bp55.

Transmission Cluster analysis

To detect local transmission clusters, newly generated and published HIV-1 pol sequences were combined into one coastal Kenya sequence dataset. In addition, the ten most similar GenBank reference sequences for each coastal Kenya sequence were obtained through BLAST29,30,56. The unique coastal Kenya sequences and the reference sequences were aligned and analysed to determine HIV-1 transmission clusters. Subtype-specific maximum-likelihood phylogenies were reconstructed in PhyML. For each subtype, transmission clusters were manually determined by inspecting the ML tree from root to terminal tips to ensure sequences clustered with high branch support. Monophyletic clades with aLRT-SH support ≥0.9 and which were dominated (≥80%) by sequences from coastal Kenya (compared to reference sequences) were defined as coastal Kenya transmission clusters17. To determine active transmission clusters, sequences were further explored using a genetic distance threshold (≤1.5%) and aLRT-SH branch support of ≥0.9 in Cluster Picker57. Transmission networks were classified based on the number of sequences per cluster into dyads (2 sequences), networks (3–14 sequences) and large clusters (>14 sequences)30.

Diversity analysis

A previously described ML bootstrap approach was employed to determine the genetic diversity in the identified clusters37. Briefly, all sequences in coastal Kenya transmission clusters were used to construct 1000 bootstrap ML phylogenies in Garli 2.058. Diversity estimates were determined in Perl (version 5.30.0) using in-house Perl and Bioperl scripts59. The diversity for each pre-defined cluster was estimated by averaging the pairwise tree-distances between the cluster-specific sequences in each bootstrap tree, resulting in 1000 diversity estimates per cluster. Next, the medians of these 1000 diversity estimates were determined for each cluster and then used in the analysis as previously described37. The scripts used in this analysis is available from the authors upon request.

Bayesian phylogenetic analysis

To estimate the dates of origin (time to most recent common ancestor; tMRCA) of the coastal Kenyan clusters, maximum clade credibility trees were generated using a Bayesian Markov Chain Monte Carlo (MCMC) approach as implemented in BEASTv1.8.260,61. Based on marginal likelihood estimators for model selection and testing in BEAST, the Bayesian Skygrid model with an uncorrelated lognormal relaxed clock and inferred under the GTR + Γ4 + Ι substitution model was adopted as the best fit model for subsequent inferences. BEAST runs of 100–300 million generations were performed, logging samples after every 10000–30000 steps with the initial 10–30% discarded as burn-in. The convergence of MCMC parameter estimates was accessed based on effective sample size estimates (ESS > 200) using Tracer v1.662. Trees were summarized in Tree-Annotator v1.8.2 (BEAST suite) and maximum clade credibility (MCC) trees were visualized in Figtree. We also aimed to dissect the demographic history of the only large coastal Kenya cluster identified. Since the sequences in this cluster did not contain temporal information (all IDU sequences were sampled in 2010), the node height posterior distribution (tMRCA) for the IDU cluster from the transmission clusters analysis described above was used as a tree-root height calibration prior (fixed the tree root height to 1985), guiding a skygrid analysis to estimate the effective population size (Ne) of the large IDU cluster over time.

Statistical analysis

Frequencies and percentages were used to describe the distribution of sequences within the study population by risk group, HIV-1 subtype, calendar year of sampling and county of sampling. A logistic regression model was used to assess factors associated with clustering. Variables with P-values <0.1 in the bivariable analysis were included in the multivariable model. A P-value of <0.05 was defined as statistically significant. A Kruskal-Wallis H test and a post hoc Dunn’s test (with Bonferroni correction for multiple comparisons) were conducted to determine differences in genetic diversity estimates among clusters from multiple risk groups. Statistics were done using Stata version 15 and RStudio (version 1.2.5001), and the packages: DescTools (version 0.99.29, and yarrr (version 0.1.6)63. The full R code is available on request from the authors.

Nucleotide sequence accession numbers

Nucleotide sequences were deposited in GenBank under the following accession numbers: MT084914 - MT085076.

Ethical consideration

All research was performed in accordance with relevant guidelines/regulations. Informed consent was obtained from all participants who provided blood plasma samples from which new HIV-1 sequences were generated (informed consent was not required for subjects whose sequences were obtained from LANL). Plasma samples used to generate the new sequences were obtained from on-going or concluded studies that were also approved by KEMRI/SERU (SERU 3747, 3280 and 3520, and SSC 894). The current study is part of a parent protocol that was reviewed and approved by the Kenya Medical Research Institute (KEMRI) Scientific and Ethics Review Unit (SERU 3547).


  1. 1.

    Kenya National AIDS Control Council. Kenya AIDS Strategic Framework 2014/2015–2018/2019, (2019).

  2. 2.

    Sanders, E. J., Jaffe, H., Musyoki, H., Muraguri, N. & Graham, S. M. Kenyan MSM: no longer a hidden population. AIDS 29, S195–S199, (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Sanders, E. J. et al. High HIV-1 incidence, correlates of HIV-1 acquisition, and high viral loads following seroconversion among MSM. Aids 27, 437–446, (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Smith, A. D. et al. Heterosexual behaviours among men who sell sex to men in coastal Kenya. AIDS 29(Suppl 3), S201–210, (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Kenya National AIDS Control Council. Kenya HIV Prevention Response and Modes of Transmission Analysis., (2009).

  6. 6.

    Bezemer, D. et al. HIV Type 1 transmission networks among men having sex with men and heterosexuals in Kenya. AIDS Res. Hum. retroviruses 30, 118–126 (2014).

    Article  Google Scholar 

  7. 7.

    Gounder, K. et al. Complex Subtype Diversity of HIV-1 Among Drug Users in Major Kenyan Cities. AIDS Res. Hum. Retroviruses 33, 500–510, (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Hassan, A. S. et al. HIV-1 subtype diversity, transmission networks and transmitted drug resistance amongst acute and early infected MSM populations from Coastal Kenya. PLoS One 13, e0206177, (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Hué, S. et al. HIV type 1 in a rural coastal town in Kenya shows multiple introductions with many subtypes and much recombination. AIDS Res. Hum. retroviruses 28, 220–224 (2012).

    Article  Google Scholar 

  10. 10.

    Osman, S. et al. Diversity of HIV type 1 and drug resistance mutations among injecting drug users in Kenya. AIDS Res. Hum. Retroviruses 29, 187–190, (2013).

    CAS  Article  PubMed  Google Scholar 

  11. 11.

    Price, M. A. et al. Transmitted HIV type 1 drug resistance among individuals with recent HIV infection in East and Southern Africa. AIDS Res. Hum. Retroviruses 27, 5–12, (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Sigaloff, K. C. et al. High prevalence of transmitted antiretroviral drug resistance among newly HIV type 1 diagnosed adults in Mombasa, Kenya. AIDS Res. Hum. retroviruses 28, 1033–1037 (2012).

    CAS  Article  Google Scholar 

  13. 13.

    Tovanabutra, S. et al. Evaluation of HIV type 1 strains in men having sex with men and in female sex workers in Mombasa, Kenya. AIDS Res. Hum. Retroviruses 26, 123–131, (2010).

    CAS  Article  PubMed  Google Scholar 

  14. 14.

    Khamadi, S. A. et al. Genetic diversity of HIV type 1 along the coastal strip of Kenya. AIDS Res. Hum. Retroviruses 25, 919–923, (2009).

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Hassan, A. S. et al. Low prevalence of transmitted HIV type 1 drug resistance among antiretroviral-naive adults in a rural HIV clinic in Kenya. AIDS Res. Hum. Retroviruses 29, 129–135, (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Pybus, O. G., Tatem, A. J. & Lemey, P. Virus evolution and transmission in an ever more connected world. Proc. R. Soc. B: Biol. Sci. 282, 20142878 (2015).

    Article  Google Scholar 

  17. 17.

    Hassan, A. S., Pybus, O. G., Sanders, E. J., Albert, J. & Esbjörnsson, J. Defining HIV-1 transmission clusters based on sequence data. AIDS 31, 1211 (2017).

    Article  Google Scholar 

  18. 18.

    Onywera, H. et al. Surveillance of HIV-1 pol transmitted drug resistance in acutely and recently infected antiretroviral drug-naive persons in rural western Kenya. PLoS one 12, e0171124 (2017).

    Article  Google Scholar 

  19. 19.

    Chimukangara, B. et al. Moderate-to-high levels of pretreatment HIV drug resistance in KwaZulu-Natal Province, South Africa. AIDS Res. Hum. retroviruses 35, 129–138 (2019).

    Article  Google Scholar 

  20. 20.

    Rodgers, M. A. et al. Sensitive next-generation sequencing method reveals deep genetic diversity of HIV-1 in the Democratic Republic of the Congo. J. virology 91, e01841–01816 (2017).

    CAS  Article  Google Scholar 

  21. 21.

    Tongo, M. et al. Unravelling the complicated evolutionary and dissemination history of HIV-1M subtype A lineages. Virus Evolution 4, vey003 (2018).

    PubMed  PubMed Central  Google Scholar 

  22. 22.

    Faria, N. R. et al. Phylodynamics of the HIV-1 CRF02_AG clade in Cameroon. Infection, Genet. Evolution 12, 453–460 (2012).

    Article  Google Scholar 

  23. 23.

    Faria, N. R. et al. The early spread and epidemic ignition of HIV-1 in human populations. science 346, 56–61 (2014).

    ADS  CAS  Article  Google Scholar 

  24. 24.

    Esbjörnsson, J., Mild, M., Månsson, F., Norrgren, H. & Medstrand, P. HIV-1 molecular epidemiology in Guinea-Bissau, West Africa: origin, demography and migrations. PLoS one 6, e17025 (2011).

    ADS  Article  Google Scholar 

  25. 25.

    Brenner, B. G. et al. High rates of forward transmission events after acute/early HIV-1 infection. J. Infect. Dis. 195, 951–959, (2007).

    CAS  Article  PubMed  Google Scholar 

  26. 26.

    Volz, E. M. et al. HIV-1 transmission during early infection in men who have sex with men: a phylodynamic analysis. PLoS Med 10, e1001568; discussion e1001568, (2013).

  27. 27.

    Ratmann, O. et al. Sources of HIV infection among men having sex with men and implications for prevention. Sci. Transl. Med. 8, 320ra322–320ra322 (2016).

    Article  Google Scholar 

  28. 28.

    Poon, A. F. et al. Near real-time monitoring of HIV transmission hotspots from routine HIV genotyping: an implementation case study. lancet HIV. 3, e231–e238 (2016).

    Article  Google Scholar 

  29. 29.

    Sallam, M. et al. Molecular epidemiology of HIV-1 in Iceland: Early introductions, transmission dynamics and recent outbreaks among injection drug users. Infection, Genet. Evolution 49, 157–163 (2017).

    ADS  Article  Google Scholar 

  30. 30.

    Esbjörnsson, J. et al. HIV-1 transmission between MSM and heterosexuals, and increasing proportions of circulating recombinant forms in the Nordic Countries. Virus evolution 2, vew010 (2016).

    Article  Google Scholar 

  31. 31.

    Bruhn, C. A. et al. The origin and emergence of an HIV-1 epidemic: from introduction to endemicity. AIDS 28, 1031–1040, (2014).

    Article  PubMed  Google Scholar 

  32. 32.

    Frentz, D. et al. Patterns of transmitted HIV drug resistance in Europe vary by risk group. PLoS one 9, e94495 (2014).

    ADS  Article  Google Scholar 

  33. 33.

    Grabowski, M. K. et al. The Role of Viral Introductions in Sustaining Community-Based HIV Epidemics in Rural Uganda: Evidence from Spatial Clustering, Phylogenetics, and Egocentric Transmission Models. PLOS Med. 11, e1001610, (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  34. 34.

    De Oliveira, T. et al. Transmission networks and risk of HIV infection in KwaZulu-Natal, South Africa: a community-wide phylogenetic study. lancet HIV. 4, e41–e50 (2017).

    Article  Google Scholar 

  35. 35.

    Bbosa, N. et al. Phylogeography of HIV-1 suggests that Ugandan fishing communities are a sink for, not a source of, virus from general populations. Sci. Rep. 9, 1051 (2019).

    ADS  Article  Google Scholar 

  36. 36.

    National AIDS and STI Control Programme. Kenya HIV County Profiles 2016., (2017).

  37. 37.

    Esbjörnsson, J. et al. Inhibition of HIV-1 Disease Progression by Contemporaneous HIV-2 Infection. N. Engl. J. Med. 367, 224–232, (2012).

    CAS  Article  PubMed  Google Scholar 

  38. 38.

    Middelkoop, K. et al. Epidemiology of HIV-1 subtypes among men who have sex with men in Cape Town, South Africa. JAIDS J. Acquired Immune Deficiency Syndromes 65, 473–480 (2014).

    Article  Google Scholar 

  39. 39.

    Konou, A. A. et al. Genetic diversity and transmission networks of HIV-1 strains among men having sex with men (MSM) in Lomé, Togo. Infection, Genet. Evolution 46, 279–285 (2016).

    Article  Google Scholar 

  40. 40.

    Côté, A.-M. et al. Transactional sex is the driving force in the dynamics of HIV in Accra, Ghana. Aids 18, 917–925 (2004).

    Article  Google Scholar 

  41. 41.

    Parczewski, M. et al. Expanding HIV-1 subtype B transmission networks among men who have sex with men in Poland. PLoS One 12, e0172473 (2017).

    Article  Google Scholar 

  42. 42.

    Skar, H. et al. Dynamics of two separate but linked HIV-1 CRF01_AE outbreaks among injection drug users in Stockholm, Sweden, and Helsinki, Finland. J. virology 85, 510–518 (2011).

    CAS  Article  Google Scholar 

  43. 43.

    Beckerleg, S., Telfer, M. & Hundt, G. L. The rise of injecting drug use in East Africa: a case study from Kenya. Harm Reduct. J. 2, 12, (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Stannah, J. et al. HIV testing and engagement with the HIV treatment cascade among men who have sex with men in Africa: A systematic review and meta-analysis. Lancet HIV (2019).

  45. 45.

    Kenya National AIDS Control Council. Kenya AIDS Responce Progress Report 2018, (2018).

  46. 46.

    Rhodes, T. et al. Is the promise of methadone Kenya’s solution to managing HIV and addiction? A mixed-method mathematical modelling and qualitative study. BMJ open. 5, e007198 (2015).

    Article  Google Scholar 

  47. 47.

    Novitsky, V., Moyo, S., Lei, Q., DeGruttola, V. & Essex, M. Impact of sampling density on the extent of HIV clustering. AIDS Res. Hum. retroviruses 30, 1226–1235 (2014).

    Article  Google Scholar 

  48. 48.

    Novitsky, V. et al. Phylodynamic analysis of HIV sub-epidemics in Mochudi, Botswana. Epidemics 13, 44–55 (2015).

    Article  Google Scholar 

  49. 49.

    Los Alamos National Library. HIV-1 database at the Los Alamos National Library, (2019).

  50. 50.

    Esbjörnsson, J. et al. Frequent CXCR4 tropism of HIV-1 subtype A and CRF02_AG during late-stage disease-indication of an evolving epidemic in West Africa. Retrovirology 7, 23 (2010).

    Article  Google Scholar 

  51. 51.

    Hedskog, C. et al. Dynamics of HIV-1 quasispecies during antiviral treatment dissected using ultra-deep pyrosequencing. PLoS one 5, e11345 (2010).

    ADS  Article  Google Scholar 

  52. 52.

    Larkin, M. A. et al. Clustal W and Clustal X version 2.0. bioinformatics 23, 2947–2948 (2007).

    CAS  Article  Google Scholar 

  53. 53.

    Guindon, S., Lethiec, F., Duroux, P. & Gascuel, O. PHYML Online—a web server for fast maximum likelihood-based phylogenetic inference. Nucleic acids Res. 33, W557–W559 (2005).

    CAS  Article  Google Scholar 

  54. 54.

    Anisimova, M. & Gascuel, O. Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. Syst. Biol. 55, 539–552, (2006).

    Article  PubMed  Google Scholar 

  55. 55.

    Lole, K. S. et al. Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J. virology 73, 152–160 (1999).

    CAS  Article  Google Scholar 

  56. 56.

    Kouyos, R. D. et al. Molecular epidemiology reveals long-term changes in HIV type 1 subtype B transmission in Switzerland. J. Infect. Dis. 201, 1488–1497, (2010).

    Article  PubMed  Google Scholar 

  57. 57.

    Ragonnet-Cronin, M. et al. Automated analysis of phylogenetic clusters. BMC Bioinforma. 14, 317, (2013).

    Article  Google Scholar 

  58. 58.

    Zwickl, D. J. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion, (2006).

  59. 59.

    Stajich, J. E. et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 12, 1611–1618 (2002).

    CAS  Article  Google Scholar 

  60. 60.

    Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus. Evolution 4, vey016 (2018).

    Google Scholar 

  61. 61.

    Drummond, A. J., Ho, S. Y., Phillips, M. J. & Rambaut, A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 4, e88 (2006).

    Article  Google Scholar 

  62. 62.

    Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67, 901–904 (2018).

    CAS  Article  Google Scholar 

  63. 63.

    Phillips, N. D. Yarrr! The pirate’s guide to R. APS Observer 30 (2017).

  64. 64.

    Faria, N. R. et al. Distinct rates and patterns of spread of the major HIV-1 subtypes in Central and East Africa. PLoS Pathog. 15, e1007976–e1007976 (2019).

    CAS  Article  Google Scholar 

Download references


We thank the International AIDS Vaccine Initiative (IAVI) for supporting the HIV at-risk cohort studies in Kilifi, Kenya. We are also grateful to the staff in the HIV/STI project at the Kenya Medical Research Institute (KEMRI) in Kilifi for their commitment to serving MSM. Finally, we thank Professor Philippe Lemey, Katholieke Universiteit Leuven, for providing useful input for dating and determining past population dynamics of the coastal Kenyan IDU clusters. This manuscript was submitted for publication with the permission from the Director of the Kenya Medical Research Institute (KEMRI). This work was supported through the Sub-Saharan African Network for TB/HIV Research Excellence (SANTHE), a DELTAS Africa Initiative [grant # DEL-15–006]. The DELTAS Africa Initiative is an independent funding scheme of the African Academy of Sciences (AAS)’s Alliance for Accelerating Excellence in Science in Africa (AESA) and supported by the New Partnership for Africa’s Development Planning and Coordinating Agency (NEPAD Agency) with funding from the Wellcome Trust [grant # 107752/Z/15/Z] and the UK government. This work was also supported in part by funding from the Swedish Research Council (grant # 2016–01417) and the Swedish Society for Medical Research (grant # SA-2016). The views expressed in this publication are those of the author(s) and not necessarily those of AAS, NEPAD Agency, Wellcome Trust, IAVI, Swedish Research Council, or the UK government. Open access funding provided by Lund University.

Author information




A.S.H., J.E., G.N. and E.J.S. conceptualized and designed the study. A.S.H., J.E. and E.J.S. provided funding for the study. E.J.S. and S.M.G. provided samples from which new sequences used in the study were generated. G.N.M. performed lab work, inferential analyses and produced all figures and tables. J.N. assisted with data analysis. A.S.H., J.E. and E.J.S. provided supervisory guidance. G.N.M. wrote the original draft manuscript and all the authors reviewed and edited the manuscript prior to submission.

Corresponding author

Correspondence to Joakim Esbjörnsson.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nduva, G.M., Hassan, A.S., Nazziwa, J. et al. HIV-1 Transmission Patterns Within and Between Risk Groups in Coastal Kenya. Sci Rep 10, 6775 (2020).

Download citation


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing