Estimating sizes of hidden or hard-to-reach populations is an important problem in public health. For example, estimates of the sizes of populations at highest risk for HIV and AIDS are needed for designing, evaluating and allocating funding for treatment and prevention programmes. A promising approach to size estimation, relatively new to public health, is the network scale-up method (NSUM), involving two steps: estimating the personal network size of the members of a random sample of a total population and, with this information, estimating the number of members of a hidden subpopulation of the total population. We describe the method, including two approaches to estimating personal network sizes (summation and known population). We discuss the strengths and weaknesses of each approach and provide examples of international applications of the NSUM in public health. We conclude with recommendations for future research and evaluation.
- Population size
- personal network size
- summation method
- known population method
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.
Statistics from Altmetric.com
In many countries, the risk of acquiring and transmitting HIV is highest among people who inject drugs, people who exchange sex for money and/or men who have sex with men. Accurate and timely information about the number of people who practice these behaviours is necessary for the design and evaluation of public health policy as well as for the allocation of resources for treatment and prevention programmes. Collecting this information with traditional survey and sampling methods has proved complicated. Individuals practising these behaviours may be difficult to find, and, although agreeing to participate in a survey, may not report accurately on behaviours that are stigmatised or illegal.
Several methods have been used for estimating the size of the populations with these behaviours. These include indirect sample estimation methods,1 2 enumeration methods,3 capture–recapture techniques,4–6 multiplier methods,7 synthetic estimation8 and multivariate indicator methods.9 For many countries, these methods may not be feasible for developing reliable estimates of population size. Enumeration methods can involve months of fieldwork to access individuals and still fail to identify individuals who practice the behaviour. Capture–recapture techniques require two or more valid, representative and independent samples of a population as well as a method to uniquely identify which individuals were recruited in more than one sample. Synthetic estimates and multivariate indicator methods are computationally intensive and may require data for each area in the country for which the estimate will apply.
A potential solution is a relatively new (to public health) technique for estimating the size of hidden or hard-to-reach populations: the network scale-up method (NSUM). We describe the background of the method, the results of its applications in public health, and an evaluation of its strengths and limitations. Finally, we report areas of further work in research and public health implementation for improving the method's utility for programming and planning, based on the consensus of an expert panel (see online supplementary appendix 1).
The network scale-up method: background and methodology
The NSUM has its roots in efforts by anthropologists, mathematicians and social network analysts to estimate the size of hard-to-count populations.10 It was first used to estimate the number of deaths from an earthquake in Mexico11 and has been used subsequently in several settings (table 1).
The method rests on the assumption that people's social networks—the set of people whom you ‘know’—are, on average, representative of the general population in which you live and move. In this case, if a sample of respondents report knowing 300 people on average, two of whom inject drugs, we estimate that 2/300th of the general population inject drugs. We combine this estimated prevalence with known information about the size of the general population to produce an estimate for the number of people in the population who inject drugs.
The accuracy of the estimate can be improved by combining responses from many respondents as follows:(1)where ê is the estimated size of the hidden population, mi is the number of people in the hidden population known by person i, ĉi is the estimated personal network size of person i, and N is the size of the general population.12 The size of the general population, N, is assumed to be known (from sources such as census information), whereas mi and ĉi are determined from data collected in network scale-up survey interviews. Respondents are asked to supply the number of people they know with the behaviour of interest (eg, ‘How many people do you know who inject drugs?’); answers to these questions are the mi in equation (1). Respondents are also asked a series of questions, described in the ‘Estimating personal network size’ section, to estimate their personal network size (ie, their numbers of social acquaintances, called their network degrees), the ĉi in equation (1). Interviews may be conducted face to face, over the telephone or using a ballot-box or computer-assisted self-interviewing technique to ensure confidentiality.
The NSUM estimate for the population size differs from that for capture–recapture, multiplier or nomination methods in several ways. First, the NSUM method does not access the target population directly, rather a random sample of the general population report about members of the target population. Second, if dictated by cost or logistical constraints, the data needed for the numerator and denominator of equation (1) can be collected in two different samples with different sizes (although, if this is done, care must be taken to reduce non-sampling errors between the two surveys). Third, the NSUM method uses estimates of numbers known in many subgroups, while the classical survey methods ask for a single set of individuals with associated attributes. Finally, N represents the known size of the total population, not a multiplier benchmark.
Clearly, a fundamental concept underlying NSUM is the definition of what it means to have a person in one's personal network or what it means ‘to know’ someone. One definition of ‘know’ that has been used in the United States is: they live in the area (to which the estimate will apply), you know them, they know you, you have had contact with them in the last 2 years, and you could get in touch with them if needed.12 This definition may be adjusted in response to local culture and language, but respondents need a clear and consistent definition of how to classify people into their personal networks. To obtain a reasonably accurate size estimate for the hidden population, we need to know that members of the hidden population are as easy or difficult to know and report as those in the known subpopulations.
The effect of the definition on the bias and variance of the estimate could depend on numerous assumptions about the patterns of network contacts in the population and assumptions about responses to these types of questions. For example, a broad definition of ‘knowing’ could expose the estimate to more reporting biases, and a narrow definition could reduce the precision of the estimate because there will be fewer individuals identified as belonging to the hidden population. Even though different definitions of knowing may expand or contract the size of a personal network, NSUM uses only the proportion of those in the network who practice the behaviour of interest (equation (1)). Thus, the effect of the choice of definition may be modest, and we suggest this as a valuable topic of further research (Le Bao, personal communication).
Two concepts are important in assessing the utility of NSUM. First, people may not know everything about members of their personal network. For example, a respondent may not know that a person in their network injects drugs. This is termed the transmission bias: the contact has not transmitted (relevant but sensitive) information to the respondent. Second, social and physical barriers, such as ethnicity, race, occupation and location of residence, may cause variation in the likelihood that respondents know people in hidden populations18 19; this is called the barrier effect.
Estimating personal network size
The first component of the NSUM calculation is an estimate of respondents’ personal network sizes (ĉi in equation (1)), a challenging task in itself. Even with a clear definition of ‘knowing’, direct questions such as ‘How many people do you know?’ are not likely to yield accurate responses given the difficulties with self-reported network data.20 Instead, two indirect methods have been used in NSUM studies to estimate personal network size: the known population method12 21 and the summation method.22
The known population method
To estimate ĉi using the known population method, each respondent is asked about the number of people they know in various populations of known size. For example, if a respondent in Egypt reports knowing five people named Ahmed, one could combine that with the fact that there were about 2 million men named Ahmed in the country (using birth registration data from 2008). We could estimate that the respondent knows about 5/(2 000 000) = 0.0000025 of all Egyptians. As there are approximately 90 million Egyptians (from census data), we would estimate that the respondent has a personal network size of 225. To reduce the variance of this estimate, we ask about many populations of known size.
When using multiple populations of known size (denoted j), an estimate of network size is computed as:(2)where ĉi is the estimated personal network size of person i, mij is the number of people in population j known by person i, ej is the actual size of population j (known from census reports, etc), and N is the size of the general population (also known).12 21
In the application of NSUM to the Mexico City earthquake,11 five populations of known size were used: doctors, mail carriers, bus drivers, TV repairmen and priests. Because the accuracy of the estimate increases with the number of known populations used, we recommend using at least 20 subpopulations.12 21 A list of known populations used in selected NSUM studies can be found in online supplementary appendix 2.
One advantage of the known population method for estimating personal network size is that it can be embedded into a statistical framework to quantify variance.18 23 However, statistics for populations of known size (people's names, commercial pilots, etc) may not be available in some countries. In addition, the NSUM estimate may be biased, depending on the known-size populations used for estimation.18 Finally, field implementation has shown that respondents under-report the number of people known in larger populations and over-report the number known in smaller populations,12 24 although statistical corrections for this response bias are possible.25–27
The summation method
The summation method is an approach to estimating personal network size when data for known populations may be missing or unreliable. Here, respondents are asked to enumerate the people they know in a list of specific relationship types or categories, such as family, neighbours, coworkers, etc. The summation of these responses yields an estimate of personal network size. At a workshop to pilot test NSUM in Thailand, participants developed a list of 17 culturally appropriate categories by free-listing relationship categories and then eliminating categories where there might be overlap.
One advantage of the summation method is that it may be easier for respondents to provide accurate answers and it does not require data for populations of known size. One limitation is that it is difficult to construct a list of perfectly mutually exclusive groups (eg, someone who is a coworker may also be a neighbour), which potentially leads to overcounting. Undercounting could result if a substantial group is omitted from the list. Further, the set of relationship categories must be chosen carefully on a country-by-country basis to match the categories that respondents use to organise relationships. Finally, as the summation approach is not embedded within any statistical framework, it is difficult to quantify the uncertainty in the estimate.
A direct comparison of the known population and summation methods in the United States22 yielded similar estimated personal network sizes. The methods were also similar in terms of time taken to interview each respondent: the scale-up method with 29 populations took an average of 7 min per respondent, and the summation method with 16 categories took an average of 5 min per respondent.
Applications of NSUM in public health
Researchers in Italy used NSUM to estimate the number of children injured from ingesting foreign bodies.15 They evaluated the relationship between the mean number of people known and the size of the subpopulation to detect subpopulations that were significantly over- or under-reported. Based on these findings, researchers eliminated subpopulations for which it appeared that respondents were less able to recall accurately.
Researchers in the Ukraine used NSUM to estimate the number of people who inject drugs, the number of people who exchange sex for money and the number of men who have sex with men.16 They found that NSUM estimates for the number of people who inject drugs were similar to estimates made from multiplier methods (using drug registration and programme coverage multipliers). On the other hand, the estimated number of women exchanging sex for money and the number of men who have sex with men were significantly lower than multiplier estimates. Along with reliable estimates for the number of people who inject drugs, investigators in the Ukraine found that NSUM could be implemented as part of a general population survey with minimum expenditure of resources.
The Ukraine investigators examined response bias and transmission bias. For response bias, respondents were asked to rate, using a scale of 1=very low to 5=very high, their level of respect for members of various population groups, including people with the behaviours that were the target of the estimates. Investigators weighted data from the respondents (about members of the high risk group, for example, men who have sex with men) by a factor Wi=Mi/M3 where Mi is the average number of men who have sex with men in the network of all people with respect level i for men with that behaviour and M3 is the average number of men who have sex with men in the network of people with a medium level of respect. For transmission bias, the investigators asked respondents who were men who have sex with men how many of those respondents' acquaintances know about that behaviour.16
Researchers in Moldova implemented NSUM based on the experience in the Ukraine. Unpublished analysis indicates underestimation of the number of people in most high risk populations compared to estimates from other methods. The Ukraine adjustment for respect did not produce expected results in Moldova; specifically, people who reported knowing more people with the high risk behaviour in Moldova were more likely to display low levels of respect for those people. Investigators stratified their sample by age to examine barrier effects; that is, to investigate the hypothesis that younger people may report more people with the risk behaviours in their personal networks than do older people. Finally, they hypothesised that both transmission and barrier effects may result in different network sizes between urban and rural areas. Analysis is ongoing.
The NSUM offers promise in estimating the numbers of people with high risk behaviours and suggests a programme of theoretical and empirical research. The examples from international applications in estimating sizes of populations at risk of HIV show encouraging results. However, several potential biases must be addressed to assure the utility of the method for public health.
First, transmission error is particularly likely in settings where the behaviour of interest is highly stigmatised. Two US studies26 27 found that people living with HIV withheld their serostatus from many of their family, friends and acquaintances. Personal networks of those living with HIV were significantly smaller than those of the general population, about 1/3 the size of the general population networks examined.24
Researchers in Brazil implemented a network survey to estimate transmission error. Building on earlier work,19 they asked sampled members of the high risk group about people socially connected to the respondent. The respondent then provided information about each of these social contacts, including whether they were aware that the respondent was in the high risk group.28 In another example, one author (SW) included a similar question in a survey of women working in venues where people meet new sexual partners; women were asked if they exchanged sex for money, self-identified as a sex worker and whether their friends identified the respondent as a sex worker or not. Results suggest that these procedures could be used as an adjustment factor in the calculation of population size estimates.17
Second, the application of the NSUM requires the assumption that the network of social contacts in the general population is essentially random. Clearly, this assumption is not reasonable in the context of behaviours that increase the risk of HIV. Use of a diverse set of known populations that are not likely to be confounded with the clustering of social contacts in the general population can address this problem.23
Third, people may know someone in a hidden population and be aware of that fact, but not report this information in an interview because of the sensitive nature of the behaviours involved. However, NSUM does not require respondents to identify themselves as a member of the hidden population. Methods using randomised response techniques29 (for estimation of size for a single population), ballot-box techniques30 or audio computer-assisted self-interviewing techniques31 can assure respondents of confidentiality and decrease this response bias. Finally, respondents tend to under-report the number of people they know in larger populations of known size and over-report the number of people they know in small ones.12 21
Fourth, barrier effects are particularly important in this context. One example26 found that relatives who were not in the same town as the informant would not be told about the informant's HIV positive status until that information could be relayed in person. These barrier effects can be particularly problematic if those who are more likely to know members of the hidden population are less likely to be included in the sample, either because of incompleteness of the sampling frame or non-response. For example, if the survey is conducted on the telephone using random digit dialling, but those without telephones are more likely to know people who use injecting drugs, the NSUM will underestimate the true number of people who inject drugs. However, if the sampling procedure systematically excludes rural residents, the procedure might overestimate the number of people who inject drugs in the entire country.
One additional area of investigation is to quantify the uncertainty in population size estimates. However, it is important to note that these methods only quantify sampling error and neglect all forms of non-sampling error such as response error.32
We found four advantages of NSUM. First, because it is based on a random sample of the general population (which includes the group at increased risk), it does not require members of a hidden population to expose their behaviour. While general population surveys are neither easy nor cheap, questionnaire items for NSUM can be imbedded in national surveys when planned. Thus, data for estimates can be collected in a standardised way across space and time, something that is extremely difficult if a method requires contact with the hidden population.
Second, the majority of the time required for people to provide responses to NSUM questions is taken in estimating personal network size (using either the known population or summation method). These questions can be embedded in any existing national survey, and data needed to estimate personal network size can be collected on a separate subsample of the general population survey if necessary.
Third, estimates for the sizes of several hard-to-count populations can be obtained simultaneously from a single data collection effort. And fourth, the known population method allows for internal consistency checks that can suggest deficiencies in modelling assumptions or data collection and can gauge the effect of adjustments. This affords the possibility to make incremental improvements to improve the accuracy of the estimates.
Some limitations for NSUM can be addressed by studying members of the hidden population themselves. One study26 carried out in-depth interviews with HIV positive individuals and asked them who they would and would not tell about their HIV status. Resulting estimates of transmission error can be combined with results from the NSUM in the general population to produce improved size estimates. Examples from the Ukraine, Moldova and Brazil show how adjustments might be made for transmission error.
Other developments can be addressed by studies in the general population. For example, the plausibility of the mi estimates can be assessed by comparing them with demographic and spatial information that can be collected during the NSUM interviews. For example, spatial patterns of knowing members of the hidden populations should match local expert knowledge. Additionally, individual responses should vary in expected ways; for example, in the United States, poorer urban respondents may be more likely to know people who inject drugs. Further, consistency of responses could be checked using a design to vary question ordering and definition terminology (eg, ‘men who have sex with men’ versus ‘gay’ versus ‘gay/bisexual’). Because response bias is a problem for the general population, accuracy of responses could be checked by interviewing specific respondents who are known to know members of the hidden population.
Future work involving both mathematical modelling and computer simulation should investigate the consequences of barrier effects, transmission error and response bias on the bias and variance of the estimates. Further, additional investigation of the sources of variance will offer guidance on the design of future studies.
Finally, direct empirical application of the NSUM along with other size estimation methods on the same population will be important to assess the relative strengths, weaknesses and biases of each method. In addition to these specific studies, comparison of data across studies will be needed to assess the robustness of various modelling assumptions. These cross-study comparisons will be facilitated by the use of common populations in the known population method or common categories in the summation method. Important to this effort will be the sharing and the public release of the data. These cross-study comparisons will allow for rapid developments in the understanding and applicability of NSUM but will depend on the willingness of the scientific community to work collaboratively toward these goals.
NSUM estimates can be validated in two ways. First, they can be compared to other size estimates produced from direct methods to determine criterion-related validity. The second method is to ask experts in each of the countries to evaluate the NSUM estimates. This face validity, referring to the degree to which something measures what it appears to measure, lacks a statistical basis but is critical in practical public health work. Estimates from the early 1990s for the number of persons using illegal drugs in the United States were not used effectively because of policy-makers' perceptions of limited credibility.33
Critical to this effort will be the sharing of methods and data from NSUM applications. This collaborative evaluation effort will provide increasingly accurate estimates across the world of the sizes of groups at highest risk for HIV/AIDS. This knowledge should facilitate the management and planning of effective prevention programmes to address the HIV/AIDS pandemic across the world.
The Network Scale-Up Method (NSUM) offers promise in estimating the sizes of populations at risk for HIV and AIDS.
Examples from international field applications of NSUM in estimating sizes of populations at risk of HIV show encouraging results.
Biases with potential impact on the method have been addressed through field modifications and present opportunities for methodological work.
Comparisons of NSUM results with those from traditional methods will increase utility for countries estimating the size of hidden populations at risk of HIV.
Gene A Shelley is now with the Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
In Memoriam: Peter D Killworth, oceanographer and social scientist.
The foundations of this work were presented at an Expert Symposium on Network Scale-Up Methods convened by UNAIDS, New York City, New York, September 2008.
Funding The preparation of this article was partially supported by UNAIDS. Work to develop the NSUM method was supported by the National Science Foundation. MJS acknowledges funding from the National Institutes of Health (NICHD), USA.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.