Estimating the population size of men who have sex with men: a modified Laska, Meisner and Siegel procedure taking into account internet populations
- Hao Chen1,
- Yanhui Zhang1,
- Hongzhuan Tan1,
- Dan Lin1,
- Mengshi Chen1,
- Niannian Chen1,
- Yugang Bao1,
- Shiwu Wen1–4
- 1Department of Epidemiology and Health Statistics, School of Public Health, Central South University, Changsha, Hunan, the People's Republic of China
- 2OMNI Research Group, Department of Obstetrics and Gynecology, University of Ottawa, Ottawa, Canada
- 3Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Canada
- 4Department of Epidemiology and Community Medicine, University of Ottawa, Ottawa, Canada
- Correspondence to Hongzhuan Tan, Department of Epidemiology and Health Statistics, School of Public Health, Central South University, No 88 Xiangya Road, Changsha, Hunan 410008, the People's Republic of China;
- Received 8 February 2012
- Revised 18 July 2012
- Accepted 28 July 2012
- Published Online First 1 September 2012
Objectives Men who have sex with men (MSM) are an at-risk population for HIV/AIDS. Accurately estimating the size of MSM is important to monitor the HIV/AIDS epidemic and to implement HIV/AIDS prevention in the MSM population. None of the current methods for MSM population size estimation is satisfactory, especially for internet samples. We try to use the modified Laska, Meisner and Siegel (LMS) method to estimate the size of MSM in tangible venues and internet virtual venues.
Methods Laska, Meisner and Siegel developed an unbiased estimator for the size of a population in a single venue based on a single sample (LMS method). In this study, we modified the LMS method for the estimation of population size of MSM (LMS* procedure). Specially, we integrated the MSM size of traditional tangible venue with internet virtual venue. Currently, the latter is an important source of socialisation for MSM population. To do this, we added a few parameters to the original LMS method. Then we applied the LMS* procedure to estimate the size of MSM in Changsha, the capital city of the Chinese Province of Hunan.
Results The LMS* procedure handled the complexity of socialisation among MSM population well. According to the LMS* procedure, the total number of MSM was 65 657 (95% CI 57 922 to 73 388), constituting a proportion of 5.43% (95% CI 4.79% to 6.07%) in the sexually active male population (15–64-year-olds) in Changsha.
Conclusions We conclude that the LMS* procedure is suitable for the estimation of a hard-to-reach population, such as MSM, in tangible venues and internet virtual venues.
Men who have sex with men (MSM) are an at-risk population for HIV/AIDS. According to the results of a HIV infection survey carried out in eight developed countries in North America, Western Europe and Australia, the HIV notification incidence increased by an estimated 3.3% per year from 2000 to 2005.1 In Jakarta, Indonesia, HIV prevalence in MSM in 2007 was 8.1%, up from 2.0% in 2003.2 In China, HIV prevalence in MSM increased to 5.8% in 2006, up from 0.4% in 2004.3
MSM has become one of the major sources of new types of HIV infections.3 ,4 In 2006, about half the total new HIV infections were found in MSM.4 In China, the proportion of new HIV infections by MSM rapidly grew from 12.2% in 2006 to 32.5% in 2009, and this trend is accelerating.3 The risk of HIV infection for MSM in China was 45 times greater than in the general population, whereas, it was 19 times greater in other developing countries in Asia.3
China has established a HIV/AIDS monitoring system, which allowed for reporting of transmission sources in new HIV infections. However, it remains a challenge to obtain some hard-to-reach population sizes precisely, including MSM. Accurate estimation of the MSM size is important to monitor HIV/AIDS epidemic and to implement HIV/AIDS prevention for the MSM population.
Several methods have been developed to estimate MSM size. Capture–recapture has been used in estimating the size of MSM, female sex workers, injecting drug users, and other hard-to-reach populations.5–10 However, some investigators do not believe MSM meet the conditions for capture–recapture analysis.8 ,9 Moreover, identifiers of MSM can only be obtained by subjective recall. Methods of snowball sampling and respondent-driven sampling require an identification of participants. Because most MSM are sensitive about their personal information, they refuse to collaborate, or they deliberately provide false information.11 Those methods can lead to biased estimates of the MSM population. The multiplier method is costly and is prone to selection bias.12 A recent study estimated MSM size in London primary care trusts with SOPHID-weighted method.13 However, most developing countries do not have a good primary care system, and therefore, it cannot be implemented.
A growing number of MSM are socialising online.14–17 Surveys without considering virtual venues cannot measure MSM size properly. According to a recent study, HIV patients tended to make homosexual friends and participate in group activities through the internet.18 Moreover, MSM who preferred online activities were at increased risk of STD.16–18 Therefore, seeking a suitable method to take into account virtual venues is important for MSM population estimation.
In 1988, Laska, Meisner and Siegel developed the LMS method. The LMS method has been used to estimate the size of mental health outpatients,19 risky sexual population and injection drug users.20 The LMS method made some improvement on the basis of a capture–recapture model, so we considered LMS as an ideal method to estimate MSM population. In this study, we attempted to estimate the MSM population size with a modified LMS method, the LMS* procedure. In addition to ascertain the MSM population size in traditional tangible venues, this modified method can identify the MSM population size in virtual venues.
In brief, LMS enables estimation of a population size based on a single survey. Furthermore, it just needs to ascertain the time of last attendance at venues. In this paper, we propose a modified LMS that can be used to estimate the MSM size both in tangible venues and virtual venues (internet). To mitigate the problem caused by potential overlaps (eg, MSM who visit multiple venues during the same period), we added some parameters into the modified LMS model.
The LMS method assumes that individuals in the target population appear on K weeks lists, but are only observable in the last week. Individuals appear on the lists by engaging in a well-defined activity one or more times during a specified time period, such as K weeks. To implement the LMS method, a survey of individuals engaging in the activity during the specific period is conducted, using x to represent the number of individuals engaging in the previous week, and P to represent the probability of being engaged in the previous week given that the person is engaged at least once during the K weeks. Then, an unbiased maximum likelihood estimator of population size is given by the greatest integer in x/P. Because P is unknown, we suppose pi represents the probability of being engaged in i week, and permits an unbiased maximum likelihood estimator to obtain results from the relationship between P and pi, where we show that . Based on the parameters x, P, we can estimate the size of target population
where r represents the sampling ratio, including ratio among venues (r1) and ratio among respondents (r2), r=r1*r2. mi denotes the number of individuals in the sample on survey day in the K week (survey week) who last engaged in the activity i weeks before the survey week, i=1,2,3…K weeks. A sufficient condition for LMS estimator to be unbiased for N is given by the following:19 (1) The probability that an individual engaging in the activity of interest in week K is equal to the average of the probability of an individual engaging in the activity of interest in week 1, 2, …, K − 1. This condition ensures that data are collected during a typical period of time when the activity occurs. (2) There should be no substantial in-migration or out-migration from target population. (3) Individuals can identify the time when the special activity last occurred. Further discussion of sufficient conditions for the LMS estimator to be unbiased is provided by Laska et al.19
The modified LMS, LMS* procedure
To estimate MSM size in a geographic or administrative area, one needs to first understand MSM venues and MSM activities. In general, MSM venues include both tangible venues and virtual (internet) venues. Tangible venues are those places where MSM can meet physically, such as sauna rooms, bars and so on, whereas, virtual venues consist of website structures, chat rooms and instant messaging tools such as QQ, the most widely used instant messaging tool in China. If someone engaged in MSM venues on which ordinary people find it very hard to log into, all visitors should, therefore, be MSM. Subjects in tangible venues refer to those who engage in MSM activity in a tangible place during the survey period. Subjects in website venues refer to MSM who log-in to MSM websites. Subjects in QQ group venues or chat rooms include the following: (1) Individuals communicate in public where their messages can be seen by other members in QQ groups or chat rooms, also known as ‘observed users’. (2) Individuals chat privately with other MSM in QQ groups or chat rooms, also known as ‘private-active users’. With the collaboration of the local Center for Disease Control (CDC) and gay organisations, we identified four tangible venues for the MSM population in Changsha, including three sauna rooms and one music bar. For virtual venues, there were about 100–150 QQ groups, and a website for MSM. We surveyed all four tangible venues, 60 QQ groups and the MSM website.
There are two steps in MSM size estimation using the LMS* procedure. In the first step, the size at a single venue is calculated using the original LMS method. In this step, every respondent is asked when was the last time he attended a tangible venue. In terms of QQ groups, the last communication from respondents is determined by browsing chatting records. For MSM websites, accurate log-in information is obtained from the web server. The number of individuals (mi) in the survey week who last visited tangible venues and QQ, or last logged-in to the website, i weeks before survey week, i=1,2,3…K weeks, is then determined.
The peak time for website visiting is usually on weekends. Because it is hard for participants to recall their last attending a particular venue precisely, we define a survey period as K=1 week. In practice, it is impossible to locate all venues. As a result, the survey will target part of the venues during typical rush hours, using r1 to denote the sample fraction of venues acquired in the qualitative survey. The sampling proportion r2 denotes the ratio of the number of surveyed respondents (mi) and all visitors in survey weeks, , h represents the maximum number of one-day visitors. Then the size at a single venue is calculated using equation (1), and the confidence interval of size is based on equation (2).
MSM socialise in both tangible and virtual venues, and often communicate in different areas. Therefore, in the second step, we used the LMS* procedure to estimate multiple sizes, and to avoid the following biases in our survey: (1) one respondent might visit different venues or units; (2) location of users in QQ or chat rooms is uncertain; (3) the size of private-active users in QQ or chat rooms is uncertain. To avoid possible biases, a random sample of respondents should be interviewed online after the first contact. For the LMS* procedure, we introduced the following parameters to modify the LMS method:
f represents the proportion of virtual venue respondents who are currently located in a particular geographic or administrative area. Since the internet has no boundary, it is necessary to determine if all respondents are currently located in the study area. One can identify the residence of websites users through checking out their servers. However, obtaining the internet protocol address of QQ or chat room users is impossible. Although most respondents may publicise their real residence for the convenience of socialisation, some of them refuse to reveal their real addresses. So the MSM proportion from virtual venues who are currently living in study area should be calculated through a sample survey.
As the size of private-active QQ users could not be calculated directly by the original LMS method, the following parameters are introduced in the LMS* procedure: (1) e represents the proportion of observed QQ users, then 1−e is the proportion of QQ users who are non-public speakers. (2) p represents the proportion of private-active QQ users in non-public speakers. (3) t represents the proportion of private-active QQ users in all QQ users, t=p*(1−e). The ratio t/e represents the proportion of private-active QQ users size () and observed QQ users size (). The total size of active QQ users size () is the sum of and , and . Figure 1 illustrates the interrelationships for the above-mentioned parameters in detail. The characteristics of chat rooms are similar to those of QQ.
d represents the overlap of MSM who visit multiple venues or units at the same period, which could also be obtained through the survey, d1 as overlap between tangible venues and QQ or chat room, d2 as overlap between website and QQ or chat room, and d3 as overlap between tangible venues and website. Because website, QQ or chat room have almost the same properties of communication among MSM, we have reason to suppose that the proportion of overlap between website and tangible venues may equal that of the proportion between the QQ group and the tangible venues, d1= d3. In tangible venues, respondents answer the question whether they log-in to QQ, or a chat room during the survey week. A random sample of QQ or chat room users can be selected and be asked the same question through a one-to-one interview. After merging the number of respondents in different venues and units, and subtracting overlaps according to parameter d, the accurate MSM size can be obtained on the basis of the size of a single venue using the LMS method. MSM size in a geographic or administrative area should add private-active size of QQ or chat rooms, meanwhile, we subtract MSM size living in other areas and overlaps size among multiple venues. This procedure modifies the original LMS method to fit the features of the MSM population, which we consider as the LMS* procedure.
Finally, we compared the results from the capture–recapture method with the LMS* procedure. We capture (first time) the MSM on Saturday and recapture (second time) 7 days later, on Sunday, in the evening for the capture–recapture method, as these times would provide the highest probability to capture MSM. To identify the recaptured MSM, we flagged those MSM captured on Saturday evening with a prepaid phone voucher.
All visitors were willing to answer our question about the last time they visited tangible venues, and in virtual venues the record could be downloaded from MSM websites or chat record, so the response rate reached 100% in our survey. A total of 566 subjects were surveyed in tangible venues, 1856 respondents were identified from QQ and 1270 respondents were identified from the MSM website. The surveyed MSM on rush days (the weekend) were 328 in tangible venues, 663 in QQ groups and 381 in the MSM website. Based on these data, using LMS to estimate each venue size N, and calculating sampling ratio r1 within venues and ratio r2 within respondents, the numbers of the MSM population in tangible venues (N1+ N2+ N3+ N4), QQ (N5), and survey website (N6) were 16 383 (95% CI 11 514 to 21 252), 17 876 (95% CI 16 904 to 18 848), and 8,688 (95% CI 8022 to 9354), respectively. Tables 1and 2, and supplementary data showed the estimating process using LMS* procedure in detail.
We tried to randomly choose some QQ users in order to acquire their residence addresses, surfing habits and information on other tangible and virtual venues during the survey period to adjust the parameters. According to these parameters, the overlap of venues could be estimated and subtracted, making the results more accurate.
In brief, among 110 respondents in QQ group, 82 lived in Changsha (f=74.55%), thus the MSM size in QQ group was calculated to be =17 876*74.55%=13 327.
Among 114 respondents in QQ group who did not chat in public, 83 usually chatted with other persons only privately (p=72.81%). The proportion of observed QQ users in the QQ group was e=20%. So the proportion of private-active QQ users among all QQ users was t=p*(1−e)=58.25%.We calculated the ratio of private-active QQ users and observed QQ users t/e=72.81%*(1–20%)/20%=2.91. MSM size in the QQ group was thus modified to =13 327+13 327*2.91=52 142.
Among the random sample of 224 QQ group respondents, 10 attended tangible venues during the survey period (d1= 4.46%), which should result in subtracting the MSM size from the QQ group, X1= *d1= 52 142*4.46%=2326. We can suppose that the overlap proportion between website and tangible venues may equal that of the proportion between the QQ group and tangible venues (d1= 4.46%), so we should subtract the overlap size from the website, X3= N6*d1= 8688*4.46%=387.
Among the random sample of 224 QQ group respondents, 38 communicated on the survey website during the survey period (d2= 16.96%), which should result in subtracting the QQ group size, X2= *d2= 52 142*16.96%=8843.
According to the LMS* procedure, the MSM size determined from all venues in Changsha was =65 657 (95% CI 57 922 to 73 388). Therefore, the proportion of MSM in 1.2097 million sexually active male population (15–64-year-olds) was 5.43% (95% CI 4.79% to 6.07%) in Changsha.
We estimated that the MSM size in tangible venues was 2636 (95% CI 1369 to 5268), and the MSM proportion in 1.2097 million sexually active male population was 0.22% (95% CI 0.11% to 0.44%) in Changsha by the CMR method. Table 3 compares the MSM population size estimated from the LMS and capture mark recapture (CMR) methods in the same tangible venues.
We estimated the MSM population in Changsha as 65 657, representing 5.43% (95% CI 4.79% to 6.07%) of sexually active male population in the city. According to WHO report in 2010, MSM proportion in the male population was 3%–12% in East Asia and Southeast Asia,21 which was similar to our results. Data from the British National Survey of Sexual Attitudes and Lifestyles found that the MSM percentages in 16 to 44-year-old males was 5.5% (4.2%, 7.1%) in London, and reached 8.27% (6.36%, 0.69%) in inner London.13 A study using the capture–recapture method in Toronto showed that the MSM proportion in males over 15-years-old was 3.86%.22 Based on the results from previous research on low and middle-income countries, the MSM percentage in China was 4%.23 In Hong Kong, a large random sample using an anonymous telephone survey indicated that 4.6% of male adults had sex with men.24 Another study in Shanghai, China, using the multiplier method, discovered that the MSM proportion in male adults was 7%.25 Overall, reported rates of MSM in sexually active males were similar to results using the LMS* procedure.
One of the advantages of the LMS method is its simplicity of use. It requires no private information, so most MSM in tangible venues are willing to answer questions about the last time they visited. For virtual venues, the record could be downloaded from the MSM websites. The internet has become a major tool in MSM socialising, especially for younger generations.26–29 Many previous methods estimated the MSM size only in tangible venues, ignoring the massive online MSM population. Therefore, previous study results might have underestimated MSM size.21 The LMS* procedure proposed in this study can integrate virtual venues, such as QQ, chat rooms and websites in the estimation of MSM size, and therefore, may reduce the bias.
The result of comparison of MSM size estimated from LMS and CMR indicated that the MSM size estimated from LMS is much more than the size estimated from CMR, even in the same tangible venues, and is closer to WHO estimation than the CMR. So we can rationally believe that the LMS method is more valid and suitable than CMR in estimating the MSM size.
Caution should be applied in the LMS* procedure for several reasons. First, since MSM belong to a subcultural group, it is hard to find all MSM hidden venues. We should identify MSM venues in a certain geographic location as much as possible before the survey. In fact, tangible venues may include parks or other public places, and other unknown or hard-to-reach places. Virtual venues are more difficult to identify,30 because QQ, chat rooms and MSM websites are numerous. In our study, we determined the number of virtual venues based on relevant literature and qualitative interviews. Although full identification was difficult, we were aware of major venues, which enabled a calculation of the sample proportion of virtual venues. Therefore, estimation error should be within permissible range. Second, we assumed that if someone engaged in MSM venues, they were MSM. While we believe that the vast majority of visitors in these internet venues are MSM, we cannot completely exclude the possibility of non-MSM people visiting these sites for unknown reasons (eg, accidentally or driven by curiosity). Recall bias could be another problem. In some studies, the visiting frequency was investigated to confirm the time of last visit and to reduce recall bias.20 In our study, the survey period, also referred to as recalling period, was set at 7 days for respondents to recall more accurately. The third obstacle is that one respondent could visit different venues during the same period. The estimation of overlap between different venues has previously been attempted by randomly choosing some QQ users to acquire their information on other tangible and virtual venues during the survey period to make the results more accurate.
After acquiring MSM size through LMS, we can make a scientific health policy and health resource distribution for HIV/AIDS prevention in the future.
None of the current methods for MSM estimation is satisfactory; we try to use an ingenious method, a modified LMS Method, to estimate MSM size.
The LMS method is simple to use in MSM estimation. It requires non-private information, which can be obtained by reliable and feasible methods.
The LMS* procedure proposed in this study can integrate virtual venues in estimation of MSM population. We conclude that the LMS* procedure is suitable for the estimation of hard-to-reach populations.
The authors thank China-Gates Cooperation Programme for funding support. The authors thank Zuoan and Zhongda gay organisations for providing information and assistance; and also thank the contributions of investigators (Meiling Luo, Shaya Wang, Yawei Guo, Chang Cai, Bin Li, Zexia Li, Xiangwen Lai, and others).
Contributors HC designed the study, administrated the implementation directly, and wrote the manuscript. YZ helped contact the gay organisations and survey fields, supervised and analysed the qualitative interviews. HT administrated the total study, including the design, implementation, analysis, literature revision. DL helped supervise the field activities and writing. MC, NC, and YB joined in the field activities and data analysis and assembly. SW provided the suggestions on the study and edited the manuscript for correct usage of English.
Funding The China-Gates Cooperation Programme Responsible by Tsinghua University (grant number: 20100147)
Competing interests None.
Patient consent Obtained.
Ethics approval The estimation of MSM size study obtained ethical approval from Xiangya Medical School Research Ethics Committee.
Provenance and peer review Not commissioned; externally peer reviewed.