Article Text
Abstract
Objective The number of persons living with HIV/AIDS, the number of new infections and the number of persons at risk for HIV infection are the foundations of evidencebased prevention, treatment and care planning. However, few jurisdictions have complete and accurate estimates of these indicators. HIV/AIDS case reporting, which includes only persons diagnosed with infection and reported to health departments, does not reflect all HIV/AIDS cases, thus underestimating the true size of the epidemic. Obtaining direct measures of HIV incidence is methodologically challenging. Moreover, no censuses exist for the number of persons at highest risk for infection, including men who have sex with men (MSM).
Method We present an approach of triangulation that draws upon multiple empirical and overlapping sources of information through different methods to synthesise databased estimates of the prevalence, incidence and denominator of MSM at risk for infection in San Francisco. We further use existing data to establish plausible upper and lower bounds for each estimate.
Result We arrived at an overall population size of 66 487 of MSM in San Francisco as of 31 December 2010. The number of MSM living with HIV/AIDS was 15 873, corresponding to an HIV prevalence of 23.9%. We projected 806 new cases in 2010, translating to an incidence rate of 1.59% per year.
Conclusions While not without limitations, our estimates provide useful information for the purpose of HIV/AIDS prevention and care planning, drawing from diverse sources that may be available in local health jurisdictions. We believe that our approach enhances the credibility of such estimates by mitigating bias from only one source of data or one methodological approach.
 EPIDEMIOLOGY (GENERAL)
 HIV
 PUBLIC HEALTH
 SEROPREVALENCE
Statistics from Altmetric.com
Introduction
Estimates of the number of persons with HIV/AIDS, the number of new infections and the number of persons at risk for HIV infection are the foundations of prevention, treatment and care planning. Few jurisdictions have complete estimates of these measures. HIV/AIDS case reporting, which includes persons diagnosed with infection and reported to health departments, underestimates the size of the epidemic. HIV prevalence, in the age of highly active antiretroviral therapy, is not a reliable indicator of the direction of the epidemic. HIV incidence is the ideal measure of the direction and size of the epidemic, but HIV incidence is difficult to measure with available technology and methodologies.1 For planning, the size of the populations affected by HIV is required to gauge the resources needed. Unfortunately, there are no complete censuses for the groups, including men who have sex with men (MSM), who account for the largest number of new HIV infections in the USA.2
Methods proposed to estimate national and local HIV epidemics include Workbook, Multiparameter Evidence Synthesis, Estimation ad Projection Package3 ,4 and Bayesian evidence synthesis.5 However, these approaches usually require assumptions on the sizes of the population at risk, HIV prevalence and incidence. In many contexts, populationbased estimates and their statistical estimates of uncertainty required for such methods are incomplete, unavailable or borrowed from other populations. This results in estimates that are not based on local empirical data. One solution for this problem is to maximise the use of local data through a process of synthesis and triangulation of multiple independent and overlapping sources of information.6 This approach, described in WHO guidelines,7 has been used by San Francisco Department of Public Health for arriving at estimates of HIV prevalence and incidence upon which to allocate resources, set programme targets and help evaluate reach and impact. To provide a case study in the use of these methods and assess HIV epidemic trends among MSM in San Francisco, we synthesised and summarised the available data in our city through 2010, building upon a process previously conducted in 2001 and 2006.
Methods
Data sources
Data were collected from a wide variety of sources. Sources included behavioural and seroprevalence surveys, the HIV/AIDS case registry, communitybased organisation programmes’ data, the municipal sexually transmitted diseases (STD) clinic and the US census. Many data sources were not independent, but had overlapping elements (eg, seroprevalence surveys include HIVinfected MSM who may or may not feature in the case registry depending on whether previously diagnosed and reported) that provided the basis for estimation in some methods.
Each source provided different data elements necessary for estimating all or part of the population size, HIV prevalence and HIV incidence among MSM. For example, case reporting provided case count data for persons diagnosed with HIV, communitybased seroprevalence surveys provided estimates of total HIV prevalence (diagnosed and undiagnosed) and service data provided unduplicated MSM client counts. In this synthesis, 19 sources of data were used (see online supplementary appendix A).
Analytic procedures
The approach began with compiling all available data concerning population size, HIV incidence and HIV prevalence. Often there were several separate estimates for each of these elements. In these instances, the median of each group of estimates was used as the point estimate with the rationale being that the median was less vulnerable to outliers that can cause severe biases in any one method. While not employed in this analysis, sensitivity analyses could be undertaken to compare and contrast estimates derived from means and medians. In some instances, such as when there were few estimates, the mean was used. We also examined point estimates at the higher and lower end of the group to select plausible figures as lower and upper boundaries, with commonsensical rules for eliminating implausible outliers. For example, estimates of the total number of persons living with HIV/AIDS cannot be fewer than the total number recorded in the case registry (ie, the number of undiagnosed cases would logically make the actual number of persons living with HIV/AIDS higher than the number diagnosed). Thus, our plausibility bounds were based on point estimates derived from empirical data and corroborated by other data as plausible estimates rather than being based on statistical uncertainty (figure 1).
HIV prevalence was defined as the median number of estimated persons living with HIV of all the point estimates divided by the median point estimate for the population size of MSM. For HIV incidence, we explored possible scenarios of how varying levels of measured incidence from different sources would affect the number of predicted new cases given the population size estimate for MSM uninfected as of 31 December 2010. HIV incidence and prevalence were then compared with the prior estimates arrived at in 2001 and 2006 to further assess their plausibility and gauge changes in the epidemic over the last several years.
Methods of estimation
Overall and adult male population size of San Francisco
We examined four adult population size estimates for San Francisco as a baseline of the total number of men living in the city. Sources for the estimates were the 2000 US Census, a California State Finance Department 2008 estimate, a California State Finance Department 2009 estimate and the American Community Surveys 2006–2008 estimate. We used the median of these as the base total population of San Francisco. We then used the proportion of adult males over 18 years of age from the US Census to estimate the proportion and number of those who were MSM.
Population size
Estimation began with determining the total number of MSM who lived in San Francisco, subcategorised into MSM who were noninjection drug users (nonIDUs) and MSM–IDU. Population sizes were then calculated using several methods. First was the multiplier method.8 Using this method, we collected information from populationbased studies that measure the use of a particular service during a given period. Using the total client count at that service in the same period, an estimate of the total population size was calculated by the formula:
where N was the total MSM population size, given by n as the number of MSM using a particular service in a specified time and p as the proportion of MSM who reported using the particular service in the time period collected in the population survey. For example, 3% of a sample of MSM reported having an HIV test at the municipal STD clinic. This clinic provided an unduplicated client count of 2029 MSM. Thus, 2029/0.03=67 633 MSM were estimated to reside in San Francisco. Several multipliers, based on data from multiple services, were used to generate multiple point estimates from which the median was selected. The use of multiple services minimised the potential influence of biases of any one multiplier or data source.
We also employed an HIV prevalencebased estimation method developed by Lieb et al.9 The method used case surveillance data, HIV prevalence estimates, US Census data and an estimate of undiagnosed HIV infection to project the total population size. For example, we took the US Census estimate of the number of men aged 18 years old and older (360 513), the total number of known MSM and MSM–IDU HIV/AIDS cases (11 502) that are available from the case registry, an estimate of HIV prevalence (22.7%) and an estimate of undiagnosed HIV infection (20% of HIV positives) provided by community surveys. MSM population size was estimated using three independent sources of data thus:
The US Census data provided an opportunity to check the plausibility of the estimate as a proportion of adult males in the city; in this case, 60 804 estimated MSM in a population of 360 513 adult males would produce 16.9% prevalence of MSM.
HIV prevalence
HIV prevalence was estimated using both HIV prevalence estimates from MSM and MSM–IDUspecific studies and case counts from the case registry. Typically, prevalence estimates are derived from either the proportion of HIV positive in a populationbased sample or by estimating the total number of cases in the population. The latter figure is based on the case registry with an estimate of the proportion of cases who have not been diagnosed and reported. Thus, major challenges in estimating HIV prevalence are the ability to obtain representative samples of the population and accurately estimating the percentage of the population who are HIVinfected but undiagnosed. We crosschecked HIV prevalence by assessing the likelihood that the number of new cases and deaths since previous estimates would arrive at the new estimate. We call this a ‘roll forward’ process. The roll forward method started with the total estimated number of cases of HIV infection from the prior synthesis process, with the addition of estimated new infections per year in the intervening years since the last synthesis, and then subtracted the actual number of mortalities among persons with HIV infection, as follows:
For example, starting with 14 205 estimated cases (kc) among MSM in 2006 plus an estimated 772 new cases (ec) per year for 4 years—530 deaths (d) between 2006 and 2009=16 763 total HIV cases among MSM were estimated for 2010.
Additionally, a ‘compartmental method’ was also used6 that built up the total number by obtaining separately the estimated pieces of the population:
For example, we had 4867 MSM AIDS cases in our case registry (mla) plus 6218 new MSM HIV nonAIDS diagnoses (nhiv) plus the rate of infected but undiagnosed MSM (23%) from a quasipopulationbased study (unk)=14 408 MSM living with HIV.
Finally, the estimated number of HIV/AIDS cases was divided by the total population size estimate to produce the final estimate for HIV prevalence for MSM and MSM–IDU.
HIV incidence
HIV incidence was estimated through methods including laboratorybased HIV incidence assays (collectively referred to as serological testing algorithms for recent HIV seroconversion or ‘STARHS’) applied to specimens collected in specific studies or testing services. Examples used in San Francisco included the ‘detuned ELISA’ and the BED assays.10 ,11 Recently, additional information (ie, prior HIV testing history, CD4 counts and viral load) was taken into consideration to rule out ‘false recent’ infections.12 Another method developed by Kellogg et al's13 used repeat testing data from cityfunded HIV test sites where data link individuals repeatedly testing over time to create an open cohort of persons at risk for HIV. We also employed a method by Hallett et al14 to create a synthetic cohort using crosssectional HIV prevalence data along with mortality data to estimate HIV incidence. This method assumed that individuals in the first crosssectional survey were represented by individuals in the second crosssectional survey, accounting for the change in the age of individuals between surveys. Changes in HIV prevalence between the two survey rounds were attributed to incident infections and mortality. The case registry data provided the overall mortality rate, and therefore, the ability to solve for new infections producing an estimate of incidence.
Results
Population size
Eleven estimates of the MSM population size in San Francisco were generated from the populationbased, multiplier method, Lieb's ratio method and population trend methods. The range of estimates was from 33 697 to 106 996 with a median of 59 809 MSM (nonIDU) in San Francisco, which we chose as the point estimate. We determined that 49 753 was the lower plausible estimate by calculating how many HIV cases an estimate of that size would produce given an HIV prevalence of 23.4% (see below). The result, 11 642 estimated HIV cases, is consistent with the number in our case registry. A population size less than 49 753 would produce fewer estimated cases than in the registry. We similarly determined that 64 339 cases was the upper plausible estimate based on producing 15 055 HIV cases among MSM in San Francisco. Any higher number would generate an unlikely number of undiagnosed HIV cases in the city (ie, >30% of HIV cases among MSM are undiagnosed, see below) (see online supplementary appendix B).
Fourteen population size estimates were generated for the number of MSM–IDU. Eleven of these estimates were based on MSM population size estimates and prevalence of MSM–IDU among MSM. That is, the overall MSM population size estimates (MSM and MSM–IDU) were apportioned to MSM and MSM–IDU using the prevalence of MSM–IDU among MSM. For example, one estimate for MSM was 75 163 using a populationbased method. In that same study, 10.6% of MSM reported ever using intravenous drugs (75 163×10.6%=7967 MSM–IDU). The other three estimates were based on the data collected through an IDUspecific study that included MSM. The median was 6678 MSM–IDU. We selected upper (7369) and lower plausible (4259) estimates in the same manner as with MSM (see online supplementary appendix B).
HIV prevalence
We had five sources of estimates of HIV prevalence among MSM (table 1). Three sources (National HIV Behavioural Surveillance System or NHBS MSM2, Assort! and Stop AIDS Project) (see online supplementary appendix A) provided prevalence estimates from serological testing or selfreport in communitybased surveys. These estimates, in conjunction with the size estimate for this population, produced an estimated 13 565, 13 391 and 8014 total cases of NHBS MSM2, Assort! and Stop AIDS Project, respectively. Using the component method we estimated 14200 cases and using the roll forward method 16763 cases. The overall median number was 13 565 HIV cases among MSM. Dividing this number by the estimated population size of MSM produces an HIV prevalence estimate of 22.7%.
For MSM–IDU, we produced three estimates of the number of HIV cases in San Francisco using prevalence from NHBSMSM, the case registry and the roll forward approach. The median number of HIV cases among MSM–IDU was 2 308. This estimate divided by the population size estimate suggests that 34.6% of MSM–IDU were HIVinfected.
HIV incidence
We compiled seven incidence estimates and 4yearly incident case count estimates for MSM (table 2). The median of the rate estimates was 2.1% per year, while the median number of newly reported cases over the 4year period was 339. Using incidence estimates and estimates of uninfected MSM for modelling the number of new HIV cases, we determined that 339 (or 0.73% per year) was the lower plausible number of incident cases. Incidence of 2.1% would result in the highest plausible estimate of 971 new cases. The median of these two estimates projected 655 (1.42%) new HIV infections among MSM in 2010.
For MSM–IDU, we had four sources for incidence estimates. As for MSM, we suggest that the lower plausible number of new HIV cases among MSM–IDU would be a reflection of the aggregate number of reported cases or 50 (1.1%) new MSM–IDU HIV cases. For the upper plausible estimate, we used the median incidence estimate (5.8%) to calculate a total of 253 new cases. Averaging these estimates suggest that there are 151 new MSM–IDU cases (3.4% incidence).
Summary
The median population size of MSM in San Francisco as of 31 December 2010 was 66 487, of which 6678 were MSM–IDU (table 3). As of 31 December 2010, the number of MSM living with HIV was 15 873 or approximately 23.9% of all MSM, with prevalence higher among MSM–IDU (34.6%) than among other MSM (22.7%). We projected 806 new cases (1.59% incidence) in 2010 or among MSM in San Francisco (1.42% among MSM nonIDU and 3.4% among MSM–IDU).
We estimated a 17.5% decrease in the number of new cases from 2006 to 2011 among MSM in San Francisco.15 ,16 (see online supplementary appendix D).
Discussion
This approach provides estimates of HIV prevalence, incidence and population size among MSM and MSM–IDU in San Francisco. We estimated that there were approximately 65 000 MSM in San Francisco in 2010. Overall, HIV prevalence was estimated to be 23.9%. We estimated 806 new HIV infections among MSM and MSM–IDU in 2010. Compared with our 2006 synthesis and triangulation exercise, we believe that there has been a 17.5% decline in the number of annual infections among MSM and MSM–IDU. We outline a data synthesis process that can be replicated in other health jurisdictions where similar data are available. Such estimates are critical for HIV prevention and care planning, and evaluation of programme impact.
Limitations to our approach can be attributed to biases in each data source. These results from how data were collected and from whom (eg, selection bias can result from data originating from services or convenience samples). Limitations of the STARHS approach to incidence estimation have been described, particularly relating to the problem of ‘false recent’ infections overestimating HIV incidence. We posit that all data sources are likely to have biases, particularly among hard to reach populations such as MSM and MSM–IDU. We believe that focusing on the median of multiple estimates helps reduce the likelihood of bias from one source. Multiple sources may balance each other when one approach tends to overestimate, while the other underestimates. For example, an HIV prevalence estimate from an STD clinic would be an overestimate as men attending the clinic may be more likely to have engaged in highrisk sexual activity. Counterbalancing that estimate might be a lower estimate derived from a venuebased survey where persons may not be forthcoming in reporting HIV status. The midpoint of these may be less biased. A second limitation is the small number of studies that directly focus on MSM–IDU, which necessitated reliance on the measures of MSM–IDU indicators within studies of all MSM. Third, although populationlevel systems are being put in place to provide routine data on HIVrelated behaviours, HIV prevalence and HIV incidence among MSM in the USA, these systems cannot provide unbiased estimates of these indicators in isolation. That is, in the absence of a true gold standard, we cannot say which single approach comes closest to the truth or a way to calibrate the methods. Data synthesis and triangulation are necessary to reduce the inherent bias in any one source of data and arrive at better estimates of crucial indicators.
While the accuracy of the estimates is hard to assess, prevention and care planning need credible, local estimates. We believe that triangulating the available data, locally and empirically derived, arrive at useful estimates of HIV prevalence and incidence among the populations at greatest risk.
Key messages

Obtaining direct measures of HIV prevalence, HIV incidence and the size of populations at risk are methodologically challenging, necessitating triangulation of available information.

Resulting estimates provide useful information for the purpose of HIV/AIDS prevention and care planning.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
 Data supplement 1  Online appendix
Footnotes

Contributors HFR collected data, led writing the manuscript. SB, NB, JH and NO participated in the analysis and writing of the manuscript. WM conceptualised the study, guided the analysis and contributed to writing and editing the final manuscript. All authors reviewed and approved the final version of manuscript.

Competing interests None.

Provenance and peer review Not commissioned; externally peer reviewed.