Introduction

Respondent-driven sampling (RDS) is a method to sample hard-to-reach populations such as injecting drug users, men who have sex with men (MSM), and sex workers1. It uses chain-referral sampling, building on the underlying contact network for recruitment of participants. RDS starts by selecting seed individuals from the population of interest. They receive a fixed number of coupons to distribute to individuals in their contact network who meet certain eligibility criteria. In turn, individuals receiving a coupon recruit new participants among their contacts, leading to successive recruitment waves until the target number of individuals for the survey is reached2. A drawback of the method is that the final sample is not representative of the target population, introducing bias in naïve estimates of, say, prevalence.

Statistical procedures using generalized Horvitz-Thompson estimators can reduce biases3,4. In those, weights are computed according to referral patterns, estimated network size, number of ties between subgroups of interest5, differences in the number of partners declared by participants6 or homophily in chain referrals7. Weights can also be computed using bootstrap procedures8. Yet, numerous issues affect the reliability and validity of RDS estimates because several hypotheses are required3,9,10,11: (1) the population size need to be large compared to the RDS sample; (2) sampling must occur with replacement; (3) population homophily should be weak; (4) seeds should be selected at random.

Recently, the network model-assisted (NMA) method has been shown to increase the robustness in prevalence estimation from RDS data with respect to these assumptions11. In this approach, characteristics of the RDS data are used to simulate population networks that resemble the source population. The simulated networks conform to degree distributions and other individual characteristics of participants, but, more importantly, allow pair characteristics such as homophily between nodes with similar characteristics, to be taken into account. In simulation studies, the NMA approach fared better than other methods to estimate HIV prevalence in RDS data even in the case of population homophily according to HIV status, when seeds were selected according to HIV status; when the RDS sample size was large relative to the underlying population and the degree distribution changed according to the outcome of interest11. The approach however requires knowing the HIV serostatus of all partners of individuals in the RDS sample even those who were not included in the study. This is unlikely to be the case for most RDS studies as serostatus is often determined only in participants. Here, we develop the use of “known patterns” for applying NMA method in this situation12. An additional concern is that homophily may not be easily ascribed to the underlying population or to preferential recruitment in RDS data13. Assumptions of no preferential recruitment were made in the original method and it is unknown whether accounting for “homophily” of non-specified origin may improve prevalence estimates.

Here, we first show that an iterative approach for applying NMA method leads to significant performance improvement in the case of non-random seed selection in an underlying network with strong homophily and differential activity. We next examine the effect of preferential recruitment on prevalence estimates. We finally compare NMA methods to other methods on RDS data from MSM in Brazil to estimate HIV prevalence.

Material and Methods

We first recall the principle of weighted estimates for estimating HIV prevalence with RDS data and describe an extension to Giles’ method for NMA prevalence estimation. Then, a simulation study is presented to investigate the performance of this estimator with three known sources of bias: seed selection, seed dependence, and population homophily. We next investigate the presence of preferential recruitment in the RDS. Finally, we apply the methods to data collected in an RDS study of MSM in Brazil.

Current methods for estimating HIV prevalence in RDS data

An RDS sample for HIV prevalence yields a collection of random variables (yi) corresponding to the serological status of individuals, with yi = 1 for HIV positive and 0 otherwise. Horvitz-Thompson estimates of the prevalence are computed as \(\hat{\pi }=\frac{{\sum }_{i=1}^{n}{w}_{i}{y}_{i}}{{\sum }_{i=1}^{n}{w}_{i}}\) where weights wi correspond to the inverse probability of sampling. Methods differ in the choice of wi: inverse degree for RDS-I5, with further modifications in RDS-II6 and the SS (successive sampling) estimator8. The tree bootstrap method allows for another way to compute weights14. These approaches are implemented in RDS Analyst, the RDS package for the R Software15, and the RDS treeboot package14.

In the NMA method, the computation of weights requires network simulation. This makes it possible and easy to include individual characteristics in modeling contacts, such as infectious status11, and to account for population homophily, i.e., the differential probability of making links with other individuals based on a specific characteristic. In this approach, population homophily, for example for HIV serostatus, can be defined as the ratio of the observed number of links between discordant HIV serostatus to the number of such links expected by chance15 where values smaller than 1 correspond to more links than expected between individuals with identical serostatus. This definition corresponds with the parameters used in exponential random graph models (ERGM).

In the original NMA method, it was assumed that the serostatus of all contacts, whether included or not in the RDS, had been reported by each participant. This allowed estimating homophily in the following algorithm for seroprevalence:

Initialization:

• compute degree distribution d°Y in RDS data according to serostatus Y

• compute initial weights wi = w°i according to an arbitrary RDS method (here SS method)

• compute homophily h° from RDS data

Repeat:

Step 1: Simulate M1 networks with degree distribution (d°i) and homophily h°

Step 2: Obtain M2 RDS samples from each simulated network with seeds characteristics as in the original RDS data and with the same number of participants.

Step 3: Compute weights distribution wi by degree and serostatus y based on the simulated RDS averaging over the M1 * M2 RDS samples

Until:

• Weights convergence or 5 iterations

Output:

• Compute seroprevalence \(\hat{\pi }=\frac{{\sum }_{i=1}^{n}{w}_{i}{y}_{i}}{{\sum }_{i=1}^{n}{w}_{i}}\) using final weights

Confidence intervals were obtained by the parametric bootstrap using the weights in the last iteration of the procedure a network was simulated using the last weights, RDS was sampled from the network and prevalence estimated from this data11. We developed a method to accelerate the Network-Model Assisted method by using configuration network (see supplementary information).

Working with limited information on homophily in the network-assisted method

However, in our experience and the data presented below, participants are unlikely to know or provide information on the serostatus of all of their contacts. Often, accurate information on HIV serostatus about sexual partners is only available in those who were recruited in the RDS study. We propose the “NMA-Iter” approach in which imputation of the missing information is first made using the fraction of HIV seropositive observed in direct partners and then updated is updated with other quantities at each iteration. In other words, we start assuming that if 1 out of 3 of the recruited contacts were HIV seropositive for a respondent, then on average 1/3 of all of his contacts were HIV positive, and update this information using the degree distribution computed at each iteration as described in the following algorithm:

Method NMA-Iter

Initialization:

• Initialisation steps of NMA

• For each individual i in the sample set Ii = ki * Pi (infected partners) and Ui=ki* (1-Pi) (uninfected partners) where Pi is the percentage of HIV seropositive recruited partners and ki reported the number of partners.

• Compute Homophily:

H = 2 * pw * degreewhiv− * (1-pw)*degreewhiv+/(degreewoverall * degreeheterophilic)

Where pw is the overall prevalence of HIV infection (weighted mean of xi), and degreeoverall,w, degreehiv-,w, degreehiv+,w, and degreeheterophilic,w are, respectively, the mean degree in the overall population (weighted mean of ki), in those HIV− (weighted mean of ki when yi = 0), HIV+ (weighted mean of ki when yi = 1) and that of heterophilic links (weighted mean of yi Ui+ (1 − yi) Ii). All these quantities are post-calibrated with current weights wi

Repeat:

Step 1–3 of NMA method

Step 4:

• Compute Homophily as above with current weights.

• Simulate a network Nw with degree distribution according to serostatus D(s) and Homophily H.

• From Nw, compute Y(d,s) the distribution of the number of HIV+ partners in individuals with status s and degree d.

• For each patient in the RDS dataset, sample Ii from Y(di, si), compute Ui = di − Ii

Loop and stop as in NMA method

Simulation study

Network and RDS simulation

To investigate the properties of the new proposed estimation procedure (NMA-Iter), we conducted simulation experiments, obtaining RDS data in a population where HIV prevalence was fixed at 20%. We varied the following parameters in the population:

  • Population size (Nw). Populations were either 10000 or 1000, so that an RDS sample of size 400 corresponded to either a small (4%) or large (40%) portion of the whole population.

  • Differential Activity. DA is defined as the ratio of the average number of partners in HIV seropositive participants and others. It allows simulating differential recruitment according to the characteristic of interest, here HIV serostatus. DA = 1 means no difference, while DA = 2 corresponds to twice as many partners for HIV positive participants.

  • Homophily. We finally varied homophily according to the characteristic of interest. Recruitment could be irrespective of serostatus (Homophily = 1) or lead to highly clustered data with twice as many partners of the same status than expected (H = 2).

These characteristics defined six scenarios, for which 150 networks were simulated in each case. Network simulations were performed using the method described in the appendix and exponential random graph models (ERGM) in the R package statnet16.

RDS samples of size 400 were sampled in each simulated network, using three coupons per participant and an (independent) participation rate of 50%. We started recruitment with seven seeds chosen at random or with seven infected seeds selected, in both cases selected with probability proportional to the degree.

Results were compared graphically by boxplots of the bootstrap distributions. We also computed mean squared errors (MSE) to compare overall errors, including bias and variance of the estimators, for the NMA, NMA-Iter and SS methods.

Effect of preferential recruitment (PR)

To investigate the impact of PR in prevalence estimates we obtained RDS samples in the absence of preferential recruitment, then increasing the probability for a partner with the same serostatus to be recruited by 1.5 compared to a partner with discordant serostatus. These situations are described as “No PR” and “+50% PR”.

Comparison between methods

We first computed prevalence estimates in the simulated RDS data using the NMA method, assuming full information on homophily. We then computed prevalence estimates using NMA-Iter with limited information on homophily. We used boxplots to show the corresponding distributions.

We then compared estimates from NMA-Iter to estimates obtained using classical RDS analysis methods (RDS-I, RDS-II, SS) in several situations with varying (1) serostatus of seeds (random or infected), (2) RDS sample size relative to the underlying network: 40% (size 1000) or 4% (size 10000) and (3) magnitude of homophily and DA (1 or 2 for both).

Investigating seed selection

Convergence plots were used to visualize the effect of initial waves on the stability of the final estimates3. In this plot, the prevalence estimates are plotted as a function of the accruing number of waves of recruitment. It shows both the effect of initial seed selection and that of sample size.

As it is also customary to drop initial waves in prevalence computations to reduce dependence on the seeds, we also show “drop-first convergence plots” where prevalences are computed using all waves but the first ones (i.e., dropping the first, then the first and second, and so on).

We visually inspected convergence plot, and drop-first convergence plots for the SS and NMA-Iter for RDS sampled in networks with no homophily and low DA and in networks with high homophily and high DA. These plots are shown starting at wave 3 and stopping at the last wave of recruitment which might vary from one simulation to another.

Application to HIV prevalence in Brazil

Several RDS studies have been carried out in Brazil in populations with high HIV prevalence17,18,19,20,21,22,23. Here, we re-analyzed the nationwide survey conducted in 2009 to estimate HIV prevalence in MSM23. The survey was conducted in ten Brazilian cities, as described in Table 1.

Table 1 characteristics of the RDS in 10 cities of Brazil.

We applied seven methods to estimate HIV prevalence in each city. For the NMA method, we used a population size of 10000 MSM in each city (a sensitivity analysis showed that estimates did not change significantly above 5000 individuals). We considered that two methods disagreed for a given city when at least one-point estimate was not within the confidence intervals of the other for that same city.

In order to investigate the differences among estimates, we looked at seeds characteristics, population size and examined the convergence plots3 and drop-first convergence plots.

Ethics

Data from The Brazilian RDS survey was approved by the Brazil National Ethical Research Committee (Comissão Nacional de Ética em Pesquisa, CONEP # 14494). Informed consent was obtained for all participants who signed consent forms for participation to the survey and later use of results23. All methods were carried out in accordance with relevant guidelines and regulations1. No additional approval was necessary for the current research work.

Results

Simulations

Performance of the NMA modified method

The NMA-Iter method allowed estimating prevalence even when serostatus information was limited to RDS participants, with estimates close to the target value (20%; Fig. 1). The results very similar to the original NMA method. The variability of the estimates from the NMA-Iter method did not increase compared to the original NMA, with similar MSE in both cases (1.51 × 10−3 and 1.59 × 10−3 for no homophily nor DA; 2.51 × 10−3 and 2.55 × 10−3 for strong homophily and DA). Additional preferential recruitment led to an increase in bias for both the NMA and the NMA-Iter methods, even though the true prevalence value remained in the confidence intervals for the simulated RDS sample sizes (Fig. 1).

Figure 1
figure 1

Prevalence estimates with network model assisted method and effect of Preferential recruitment. NMA is the original method requiring full information on serostatus of contacts, NMA-Iter use only serostatus of reported contacts. Preferential recruitment were fixed at no preferential recruitment and +50% of preferential recruitment (left to right). Simulation were performed in networks of size 10000 and: (a) population homophily at 1 and DA at 1; (b) population homophily at 2 and DA at 2.

The estimated homophily was close to the nominal in the case of no homophily in the simulated population (H = 1) and no differential activity and no preferential recruitment (estimated value 0.96 [0.87–1.05]). Bias occurred with deviations from the random situation, with downward bias for increasing differential activity (estimated homophily 0.90 [0.81–1.02] for DA = 2 and no PR), and upward bias with preferential recruitment and no DA (1.09 [1.01–1.18] for PR + 50%). With population homophily (H = 2) in the simulated population, this was generally underestimated in the NMA method (estimated homophily = 1.45 [1.29–1.73]), and the estimate increased in case of additional preferential recruitment (1.50 [1.16–2.00] for + 50% PA).

The seroprevalence estimates in the NMA-Iter is shown in Fig. 2 along with other methods. When the population size was large relative to the RDS and seeds were selected at random, all methods yielded similar results even with high homophily and DA. (Fig. 2.1). The bias introduced by selecting only HIV + seeds was the smallest for the RDS-I and NMA-Iter methods (Fig. 2.2). When populations were small (1000 individuals, RDS including 40% of population size), and when seeds were chosen at random, the NMA-Iter method, SS method and tree bootstrap method performed the best under conditions of high homophily and high differential activity (Fig. 2.3). The NMA-Iter remained unbiased when all seed were HIV + in networks with high homophily or DA (Fig. 2.4 and 2.5). Finally, the NMA-Iter method provided the best results for small populations, high homophily and DA, and use of all HIV + seeds (Fig. 2.6).

Figure 2
figure 2

Prevalence estimates using different network patterns in networks of different size. Nw: network size, DA: degree activity, H: Homophily. DA can be at 1 or 2 and H can be at 1 or 2, Nw can be at 10000 or 1000. RDS are of size 400. Preferential recruitment is at 1.

The presence of preferential recruitment had an impact on seroprevalence estimates for all methods tested (NMA-Iter, RDS-I, RDS-II, SS) (Fig. 3). The bias in the NMA-Iter method remained smaller than in other methods but there was a corresponding increase in variance of the estimates. In the most extreme situation (H = 2, DA = 2, PR = 1.5), the MSE was 12.3 × 10−3 for NMA-Iter and 4.8 × 10−3 for the SS method.

Figure 3
figure 3

Comparison of different methods to estimates prevalence in RDS with different preferential recruitment. Preferential recruitment were fixed at “No PR” and “+50% PR” (left to right). Simulation were performed in networks of size 10000 and: (a) homophily at 1 and DA at 1; (b) homophily at 2 and DA at 2.

Comparison of SS and NMA-Iter in convergence and drop-first convergence plot

In networks combining differential activity and homophily, the convergence plots showed that the prevalence computed by the SS method was more sensitive to the choice of initial seeds than the NMA-Iter method, with estimates reaching a plateau only after the 10th wave for the SS method and much quicker with the NMA-Iter method (Fig. 4). The drop-first convergence plots even showed that dropping early waves may introduce bias when using the SS method, while the NMA-Iter method remained satisfactory (Fig. 4).

Figure 4
figure 4

Convergence plot and drop-first convergence plot using SS method (upper part) and NMA-Iter (lower part) in network with high homophily and high degree activity (H = 2 and DA = 2).

HIV prevalence estimation for Brazilian MSM

Table 2 shows the results for the ten cities using six methods. For most cities, point estimates were between 2 and 10% regardless of the method for estimation, with the exception of Rio de Janeiro and Brasilia where prevalence is around 20%. The results from the different methods did not agree in Recife, Campo Grande, Rio de Janeiro, Santos, Curitiba, and Itajaí (i.e., at least one pair of methods with one-point estimate falling outside the confidence interval of the other method). In Brasilia, variations were substantial for the different methods with correspondingly large CIs. In this city, the point estimate was on the order of 10% for RDS-I, RDS-II, and SS methods, while the Tree bootstrap and NMA-Iter returned a point estimate of around 20%.

Table 2 Prevalence of HIV in MSM in 10 Brazilian cities based on different RDS analysis methods and from from Kerr and al. previous work.

Seed selection

The proportion of HIV + seeds was high in comparison to the estimated prevalence in Recife (3/10) and Itajaí (5/15) (Table 1). Interestingly, the pattern of results in Itajai was similar to the situation in Fig. 2.42.5, with RDS-I yielding smaller prevalence than NMA-Iter and the 3 other methods yielding higher value. For Recife, the results were similar to case 2.6, with NMA-Iter yielding higher estimates than the others.

Population size

In Itajai, it seems that a large part of the MSM population was included in the RDS. Indeed, the RDS included 310 individuals, whereas the MSM population was estimated to range between 700 and 2000 (1 to 3% MSM in adults in a city with 7000 adults)8. Once again, the pattern of estimates looked like in Fig. 2 where the RDS fraction was large in the overall population.

Network structure

Homophily calculated using NA-Iter varied from 0.46 for Brasilia to 1.72 in campo grande (Table 1). As seen in the simulation part, non-random homophily might impact results from RDS I and RDS, even if seeds are selected at random. SS and treeboot are biased in the context of seed selection. This is in accordance with our results in Rio de Janeiro, Campo Grande, and Brasilia where homophily values were far from 1.

Convergence of estimates

Figure 5 shows convergence plots for four cities using the SS method. In Manaus, the graphical analysis demonstrated that the sample appeared to achieve convergence (Fig. 5a1,a2). In Campo Grande (Fig. 5b1,b2), convergence plots started with each seed indicated the possibility of distinctive sub-networks being selected, with little indication of convergence. However, in this city, the plots using the NMA-Iter methods suggested otherwise, with little change in the last waves (see Fig. 6).

Figure 5
figure 5

Graphical analysis of convergence. Top row: bottleneck plots showing for each city prevalence estimates evolution per seed during recruitment. Bottom row: convergence plots showing overall prevalence estimate during the recruitment process for each city. All estimates are based on the SS method.

Figure 6
figure 6

Convergence plot and drop-first convergence plot using NMA-Iter method for the city of Rio de Janeiro (upper part) and Campo Grande (lower part). W: Wave. Results from Rio De Janeiro confirmed the absence of convergence of the estimates. Results from Campo Grande show a convergence around 4% of prevalence. Impact of seed selection is visible up to wave 9.

Lack of convergence of estimates was demonstrated in Brasilia, Santos (not shown) and Rio de Janeiro (Fig. 5c1,c2). In Rio de Janeiro, even at the end of the sample, the slope of the convergence curve appears not to be leveling. It does create doubt that the estimated value is not the final value (Fig. 5c1,c2). This was confirmed for this city by using convergence plot and drop-first convergence plot using NMA-Iter method. Convergence plots showed substantial variations of the estimates before reaching waves 9. Drop-first convergence plot added value by showing that the last waves had still a strong impact on estimates, highlighting that equilibrium was not reached at the end of the RDS process (Fig. 6).

Discussion

In this work, we investigated the use of the Network-Model Assisted method for prevalence estimation using RDS data with limited serostatus information. We found that an iterative method (NMA-Iter) had good properties when information on homophily was limited, even in the case of high differential activity, population homophily and non-random seed selection. An application of these methods to a Brazilian dataset allowed a critical assessment of the reported HIV prevalence among MSM.

NMA-Iter method with respect to other methods

Network modelling is a promising method for estimating prevalence in RDS data as it allows using the characteristics of contact networks, however it requires information on the serostatus of all partners11,15. Here, we showed that prevalence estimates obtained when only the subset of individuals included in the RDS sample remained unbiased and did not increase the variability of estimates. This is of importance in practice, as only the serostatus of participants will be available in most RDS samples. An iterative update of all quantities provided better results than a single-step approach (results not shown). The NMA-Iter method kept the advantages already described for NMA compared to other methods in case of seed selection, small network size, population homophily and differential activity. It also compared favorably with the tree bootstrap methods in simulations.

Several assumptions are required to use RDS data for seroprevalence estimation. First, seeds should be selected at random in the population. In practice, one may end up including preferentially, for example, HIV infected individuals as seeds if they can be reached more easily when they seek care or to include a minimum number of cases when seroprevalence is low. The effect of initial seeds should vanish over waves of recruitment in RDS samples8,10, and this led to recommending dropping early waves before computing estimates. The “drop-first” plot showed that seed selection could affect some methods more heavily than others, was present in the SS method even when the first waves were discarded from the analysis and was much reduced in the NMA-Iter analysis. Using only infected individuals as seeds even led to good performance with the NMA-Iter method irrespective of the characteristics of both the population and the RDS size. The “drop-first” plot is an addition to diagnostic convergence plots3 for visual inspection of the effect of seed selection.

A second potential limitation is the size of the source population relative to the RDS sample, as the replacement assumption is more likely to be violated when the RDS sample amounts to a large part of this population. In the data we considered, the city of Itajaí may illustrate this situation, as the MSM population size is expected to range between 700 and 2000, while the RDS sample included 310. Simulations for small populations have shown that most methods are biased in this situation, even though it remains a minor contributor to overall bias4,24. This was confirmed for the SS estimator and the NMA-Iter that yielded consistent estimates even with sampling fraction as high as 40% of the population.

Last, there are effects of the underlying network structure, for example homophily according to HIV status which has been described among MSM, which affect the RDS data25,26. Identifying prevalence estimation methods that can account for it is of importance. The NMA-Iter method, as it estimates homophily from the RDS data and takes it into account for estimation, could reduce bias even with strong homophily and differential activity. These results were however shown with random selection among partners in the RDS data, i.e. no preferential recruitment. The main effect of preferential recruitment according to HIV status should be to increase homophily in the collected data. As both sources of homophily can not be told apart in RDS data, one could assume that estimating a single “working” homophily could preserve the characteristics of the NMA method13. In other words, not identifying the cause of homophily may be alright as homophily is a nuisance parameter in the NMA method rather than the target of estimation. Indeed, our simulations showed that the estimated “homophily” parameter in the NMA method changed with both preferential recruitment and population homophily, as it increased with increasing preferential recruitment and population homophily. The simulations showed that the unbiasedness observed with population homophily alone was not preserved in the case of preferential recruitment. NMA-Iter remained less biased than the other methods, but it was also more variable. It would be of interest to examine whether the partial identification reported in Crawford could further improve the NMA results in this respect13.

HIV prevalence in MSM in Brazil

Stigma and discrimination against MSM remain high in Brazil, especially among the poor and little educated populations20,27,28. This makes it difficult to obtain reliable HIV prevalence estimates among MSM and indicates RDS as a choice method for assessing prevalence17. Our analysis however showed that estimates changed according to the method and justified looking for those which are the less affected by characteristics of the underlying population. In this respect, we investigated seed selection, population size, recruiting practices, and network structure to find that the NMA-Iter method performed better than other methods. This gives ground to the consideration of its results as more relevant.

Yet, even with the NMA-Iter method, convergence to a well-defined HIV prevalence estimate did not occur in some cities. Too small sample size could be the reason. However, the 3 cities where this was the most apparent (Brasilia, Santos, Rio de Janeiro) did not have particularly small RDS sample size compared with other cities and, furthermore, simulations did not evidence cases of lack of convergence with more than 300 participants. It is more likely that the structure of the population may be at stake. Indeed, recent work indicated that clustering in the population could make prevalence estimates challenging, even after reaching a reasonable sample size29,30. As network structure cannot be known a priori, it is also reasonable to recommend analyzing RDS data during collection, to detect, as early as possible issues with the recruitment process which could lead to study failure31.

In this paper we have pursued Gile’s injunction to analyze and diagnose sources of bias in RDS studies3. The understanding of variation between estimates with different methods is essential to allow for meaningful comparisons. Finding the least biased estimate remains a crucial concern not just for statisticians, but for health authorities to follow the evolution of prevalence over time and to evaluate intervention effectiveness. In Brazil, comparable, national level RDS survey has been conducted among MSM, and been compared to the study reported here32,33. Controversy about the high seroprevalences reported, especially among young MSM reiterate the need for improved estimator methods satisfying both methodological and practical concerns to separate methodological and programmatic. How much of the differences found are due to surveillance implementation challenges, estimator methods, changes to the populations and behaviors classified as MSM, or changes in the underlying epidemic? Here we contribute to this discussion, demonstrating that network-assisted model, despite being more time consuming, was the least biased for sources explored in this study.

Limitations

The issues outlined in this work are known shortcomings for the RDS methods3,4. Although we have shown that the NMA-Iter method was better than most other methods, we cannot exclude that other unmodelled phenomena contribute to errors in the seroprevalence estimates.

Conclusion

RDS remains at the core of HIV surveillance methods throughout the world for difficult to reach populations. Measuring changes in HIV prevalence is however limited by concerns regarding the variability of findings generated through RDS. The network-assisted model is a powerful method to estimate prevalence using RDS data that may be less biased than other methods. Its use should be encouraged, even if it still requires the use of specific software. These changes are all the more necessary when the effect of treatment as prevention or pre-exposure prophylaxis will have to be quantified in stigmatized populations as MSM in Brazil.