Respondent-driven sampling as Markov chain Monte Carlo

Sharad Goel; Matthew J Salganik

doi:10.1002/sim.3613

Respondent-driven sampling as Markov chain Monte Carlo

Stat Med. 2009 Jul 30;28(17):2202-29. doi: 10.1002/sim.3613.

Authors

Sharad Goel¹, Matthew J Salganik

Affiliation

¹ Yahoo! Research, New York, NY 10018, U.S.A.

Abstract

Respondent-driven sampling (RDS) is a recently introduced, and now widely used, technique for estimating disease prevalence in hidden populations. RDS data are collected through a snowball mechanism, in which current sample members recruit future sample members. In this paper we present RDS as Markov chain Monte Carlo importance sampling, and we examine the effects of community structure and the recruitment procedure on the variance of RDS estimates. Past work has assumed that the variance of RDS estimates is primarily affected by segregation between healthy and infected individuals. We examine an illustrative model to show that this is not necessarily the case, and that bottlenecks anywhere in the networks can substantially affect estimates. We also show that variance is inflated by a common design feature in which the sample members are encouraged to recruit multiple future sample members. The paper concludes with suggestions for implementing and evaluating RDS studies.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Biometry
Epidemiologic Methods
Female
HIV Infections / complications
HIV Infections / epidemiology
Humans
Male
Markov Chains*
Models, Statistical
Monte Carlo Method*
New York City / epidemiology
Public Health / statistics & numerical data
Sampling Studies*
Social Support
Substance-Related Disorders / complications

Grants and funding

R24 HD047879/HD/NICHD NIH HHS/United States