Evaluation of bias in HIV seroprevalence estimates from national household surveys

Objectives: To evaluate HIV seroprevalence estimates from demographic and health surveys (DHS) and AIDS indicator surveys (AIS) for potential bias because of non-response and exclusion of non-household population groups. Methods: Data are from 14 DHS/AIS surveys with HIV testing, conducted during 2003–6. Blood samples were collected and analysed for HIV using standard laboratory and quality control procedures. HIV prevalence among non-tested adults was predicted based on multivariate statistical models of HIV for those who were interviewed and tested, using a common set of predictor variables. Estimates of the size of non-household populations in national censuses were used to assess potential bias because of their exclusion in the household surveys under different assumptions about proportion of adults and HIV prevalence in non-household populations. Results: Non-tested men had significantly higher predicted HIV prevalence than those tested in eight of the 14 countries, while non-tested women had significantly higher predicted prevalence than those tested in seven of the 14 countries. Effects of non-response were somewhat stronger in lower-prevalence countries. The overall effect of non-response on observed national HIV estimates was small and insignificant in all countries. Estimated effects of exclusion of non-household population groups were generally small, even in concentrated epidemics in India and Cambodia under the scenario that 75% of the non-household population was adults having 20 times greater HIV prevalence than adults in household surveys. Conclusions: Non-response and the exclusion of non-household population groups tend to have small, insignificant effects on national HIV seroprevalence estimates obtained from household surveys.

In countries with generalised epidemics, national estimates of HIV prevalence levels and trends in the adult population are generally derived indirectly from HIV surveillance among pregnant women attending selected antenatal clinics. 1 2 Recently, HIV seroprevalence data have also been collected in national population-based surveys, such as the demographic and health surveys and AIDS indicators surveys. 3 These surveys have enabled direct estimates of population HIV prevalence. 4 5 A major challenge for the surveys is potential bias as a result of non-response. 4 6-8 Some eligible respondents may be absent at the time of the survey while others may be incapacitated or refuse to participate. The survey estimates of HIV prevalence may be biased to the extent that nonresponders have different HIV prevalence levels than the responders. There is much evidence that mobility, which is one of the reasons for absence at the time of the survey, tends to be associated higher-risk sexual behaviours [9][10][11] and risk of sexually transmitted infections, 12 13 including HIV infection. 9-11 14 But some studies have failed to find an association between mobility and risk of HIV infection. 15 16 There is limited, inconclusive research on how refusal to participate in population-based surveys is associated with risky sexual behaviours. 6 17 In a recent study that included an assessment of non-response bias in five countries, Mishra et al 4 concluded that non-responders tend to have somewhat higher HIV prevalence, but this bias has no significant effects on national seroprevalence estimates. Other previous studies have also failed to establish that population-based surveys significantly downwardly bias national HIV seroprevalence estimates. [17][18][19] Another major challenge for the surveys is potential bias because of the exclusion of nonhousehold population groups. Survey estimates may be biased to the extent that people residing in institutions (such as brothels, prisons, hostels, military/police barracks, long-term care homes) or those who are homeless have different HIV prevalence levels than those living in households and included in the survey sample. While there is considerable evidence that some of the institutional populations (such as brothels 20 21 and prisons 22 23 ) and the homeless 24 tend to have higher risk of HIV infection, there is no previous empirical research to examine how exclusion of non-household population groups might impact national prevalence estimates based on household samples.
In this study, we expand the analysis of nonresponse bias in HIV seroprevalence estimates to 14 demographic and health surveys (DHS) and AIDS indicator surveys (AIS). Additionally, in five surveys with varying levels of HIV prevalence, we evaluate potential bias in national seroprevalence estimates because of exclusion of non-household population groups.

METHODS
This study uses data from 14 nationally representative surveys of adult women and men, conducted during 2003 and 2006. Eleven of these surveys were DHS: Burkina Faso, Cambodia, Cameroon, Ethiopia, Ghana, India, Kenya, Lesotho, Malawi, Rwanda, Zimbabwe; and three were AIS: Cote d'Ivoire, Tanzania, Uganda. All these surveys included HIV testing and HIV serostatus data were linked to respondents' socioeconomic and behavioural characteristics. Dried blood spot samples were collected (venous blood in Uganda) and analysed for HIV using standard laboratory and quality control procedures and internationally accepted ethical standards. 25 HIV test results were linked anonymously to the characteristics and behaviours of the survey respondents.
In most surveys, nationally representative samples of women age 15- 49 and men age 15-59 were tested for HIV. The only  exceptions are Uganda where women age 15-59 were tested;  Tanzania, Cote d'Ivoire and Cambodia where men age 15-49 were tested, and India, Kenya, Malawi and Zimbabwe where men age 15-54 were tested. In the 14 countries included in this analysis, the numbers eligible for HIV testing ranged from 3305 males  and 3758 females  in Lesotho to 64 175 males  and 62 182 females  in India.

Analysis of bias because of non-response
To estimate the extent of non-response bias and its potential impact on the observed HIV rates in the 14 countries with linked data, all eligible respondents were divided into four groups: (1) interviewed and tested; (2) not interviewed but tested; (3) interviewed, not tested; and (4) not interviewed, not tested. Eligibility for individual interview and HIV testing was based on de facto population.
To evaluate the effect of non-response bias on the survey estimates, HIV prevalence is predicted among non-responding adults (groups 3 and 4) based on multivariate models of HIV for those who were interviewed and tested (group 1), using a common set of predictor variables. A logistic regression model is used, after accounting for clustering in the survey design, to calculate predicted HIV prevalence separately for the ''not interviewed, not tested'' and ''interviewed, not tested'' groups. Predictions for the ''not interviewed, not tested'' group are based on a limited set of variables (only from the household questionnaire), but predictions for the ''interviewed, not tested'' group additionally use several individual sociodemographic and behavioural characteristics of the respondents, as collected in the survey (see footnotes to table 2).
Multivariate analyses used Stata version 9.0. Analysis was carried out separately for males and females for each country. Adjusted HIV prevalence was calculated as a weighted average of observed prevalence among those who were tested and predicted prevalence in the two groups of non-tested respondents. Sampling weights were applied in accordance with standard DHS procedures. We used HIV sampling weights for the tested, individual sampling weights for the ''interviewed, not tested'', and household sampling weights for the ''not interviewed, not tested'' groups, respectively.

Analysis of bias because of exclusion of non-household population
In five of the countries (Cambodia, India, Ghana, Uganda and Lesotho), we examine potential bias because of exclusion of non-household population groups on the survey estimates of HIV prevalence for adults age 15-49. These countries were chosen to represent countries at varying levels of HIV prevalence.
For this purpose, we obtained national estimates of the size of household population, size of non-household population (including both institutional and homeless), total population, the annual population growth rate and the proportion of adults age 15-49 in the total population in each country. [26][27][28][29][30] Using the annual growth rate, the household, non-household and total population sizes were projected to the DHS survey year. Next, using the proportion of adults in the total population, numbers of adults in the household, non-household and total population were estimated for the survey year. Adults are more likely to live in institutions and be homeless than children or elderly, but information on the age structure of the non-household population was not readily available from census in most cases. We therefore used different assumptions about the proportion of adults in the non-household population and the level of HIV prevalence among non-household adults to estimate overall HIV prevalence among all adults in each country (accounting for exclusion of non-household population groups).
We estimated the potential impact of excluding non-household population groups under the following three scenarios: c Scenario A (baseline): The proportion of adults  in the non-household population is the same as in the census population; and HIV prevalence among non-household adults is the same as the prevalence among adults in the household survey.
c Scenario B: The proportion of adults  in the nonhousehold population is 66.67%; and the HIV prevalence among the non-household adults is 10 times in India and Cambodia, five times in Ghana, two times in Uganda, 1.5 times in Lesotho that of the prevalence among adults in the household survey.
c Scenario C: The proportion of adults  in the nonhousehold population is 75.00%; and the HIV prevalence among the non-household adults is 20 times in India and Cambodia, 10 times in Ghana, four times in Uganda, two times in Lesotho that of the prevalence among adults in the household survey.

RESULTS
HIV prevalence among adults  in the 14 countries ranged from less than 1% in India and Cambodia to 23.2% in Lesotho. Despite large HIV prevalence differences among the surveys, fairly consistent patterns of HIV infection are observed by age, sex and urban/rural residence (data not shown).

Estimates of bias because of non-response
Household response rates were very high in all surveys (93% or higher) (table 1). Response rates for the individual interview were also above 90% in most surveys. Individual interview response rates for females ranged from 90% in Cote d'Ivoire and Zimbabwe to 98% in Rwanda. Individual interview response rates for males were lower than for females in all 14 countries, and ranged from a low of 82% in Zimbabwe to 97% in Rwanda.
Response rates for HIV testing were lower than those for individual interview in all cases. In seven of the 14 countries, the difference in the response rates for individual interview and for HIV testing was greater than 10 percentage points for both males and females. The highest differences were observed in Malawi, where the response rate for HIV testing was 23 percentage points lower for males and 25 percentage points lower for females than the corresponding response rates for individual interview. On the other hand, Rwanda had the smallest differences between the individual interview and HIV testing response rates of about 2 percentage points for males and 1 percentage point for females.
HIV response rates for males were lowest in Malawi and Zimbabwe (63%), followed by Lesotho (68%) and Kenya (70%). The highest male HIV response rates were in Rwanda (96%), followed by Cambodia and Cameroon (90% each). Similar to individual interview response rates, HIV response rates for females were considerably higher than for males in all countries. Female HIV response rates ranged from 70% in Malawi to 97% in Rwanda, and were above 90% also in Cameroon, Burkina Faso and Cambodia.
Refusal was a more important reason for HIV non-response than absence in all countries for women (except in Rwanda) and in nine of the 14 countries for men. In Rwanda, very few women or men refused testing. In all countries, men were much more likely than women to be absent for testing. In 12 of the 14 countries, the HIV non-response rate because of absence was two to four times greater for men than for women.
Non-response rates because of both refusal and absence were much higher in urban areas than in rural areas. Also, the nonresponse rates were considerably higher among more educated and wealthier respondents. In five of the eight countries, where data on chronically ill adults (seriously ill for three or more months in the past year) were available, response rates were slightly higher among chronically ill adults than among adults who were not chronically ill. There were no clear patterns in the HIV non-response rates by various risk and protective factors (data not shown).
In most countries, non-tested males and females have higher predicted HIV prevalence than the observed prevalence among those who were tested (table 2). In eight of the 14 countries for males and in seven of the 14 countries for females, the predicted prevalence among non-tested individuals is significantly greater than the observed prevalence among those tested. In Uganda for both males and females and in Kenya for females, the predicted prevalence among the non-tested individuals is significantly lower than among those tested.
Adjusting the observed national HIV estimates from tested males and females by accounting for the predicted rates among the non-tested makes little difference to the observed estimates in most cases (fig 1). Even in countries where predicted prevalence among the non-responders is significantly higher or lower, the adjusted prevalence for all eligible respondents is about the same as the observed prevalence based only on the tested respondents. Although not statistically significant in all 14 countries, the effects of non-response tend to be somewhat greater among lower prevalence countries for both males and females.

Estimates of bias because of exclusion of non-household population
Our simulation analyses for India, Cambodia, Ghana, Uganda and Lesotho show that under varying assumptions of much greater HIV prevalence among non-household adults, estimated bias because of exclusion of non-household population groups in Additional variables for predicting HIV in the ''interviewed, not tested'' group included: marital status; childbirth in last five years (women only); work status; media exposure; ethnicity; religion; circumcision (men only); STI or STI symptoms in the last 12 months; alcohol use at last sex in the last 12 months; number of sex partners in the last 12 months; cigarette smoking/tobacco use; age at first sex; number of lifetime sexual partners; number of sexual partners in the last 12 months; condom use at last sex in the last 12 months; higher-risk sex (sex with a non-marital, non-cohabiting partner) in the last 12 months; knowledge of prevention methods (abstinence, being faithful and condom use); attitudes towards people living with HIV (PLHIV). Woman's ability to negotiate safer sex with spouse; woman's participation in household decision-making (women only); number of medical injections in the last 12 months; duration of stay in current place of residence; number of times slept away in the last 12 months (men only); away (from usual place of residence) for more than one month in the last 12 months (men only); and previously tested for HIV. The list of additional variables used varied slightly from country to country, depending on the availability of information.
national HIV prevalence estimates from household samples tends to be small (table 3). In India, for example, under scenario B where the proportion of adults age 15-49 in the non-household population is assumed to be 67% and the HIV prevalence among non-household adults is assumed to be 10 times the prevalence among household adults (2.80%), the estimated HIV prevalence among all adults increases only slightly, from 0.28% to 0.31%. Under scenario C, where the proportion of adults in the non-household population is assumed to be 75% and the HIV prevalence among nonhousehold adults is assumed to be 20 times the prevalence among household adults (5.60%), the estimated HIV prevalence among all adults increases to 0.35%. Similarly, in Cambodia, the observed HIV prevalence in the survey (0.62%) increases to 0.77% under scenario B and 0.98% under scenario C. In Ghana, Uganda and Lesotho, with much higher levels of HIV prevalence, estimated bias because of exclusion of non-household population groups tends to be relatively smaller.

DISCUSSION
HIV response rates for females were considerably higher than for males in all countries. The lower response rates for males mainly reflect more frequent absence of men from the households. In 12 of the 14 countries, the HIV non-response rate because of absence was two to four times greater for males than for females. Non-response rates were higher among urban, more-educated and wealthier respondents. These patterns of non-response are typical of most household surveys in developing countries. However, there were no clear patterns in nonresponse rates by various risk and protective factors. Chronically ill adults were equally or more likely to participate in the surveys, suggesting that differential participation of chronically ill adults is unlikely to be a major source of bias.
The non-responder males and females tend to have higher predicted HIV prevalence than those tested. In eight of the 14 countries for males and in seven of the 14 countries for females, non-responders have significantly higher predicted prevalence, but consistent with previous research, the overall effects of nonresponse on the observed national HIV prevalence estimates are small and insignificant in all 14 countries. 4 17-19 The small effects of the non-response bias on the observed national estimates are due mainly to a much smaller proportion of non-responders than those who were tested in the surveys. The effects of nonresponse are somewhat greater among lower prevalence countries for both males and females.
Our analysis of potential bias in the national HIV prevalence estimates because of the exclusion of non-household population in five countries indicated that exclusion of non-household population groups in the surveys is likely to have only a minimal effect on the observed national HIV prevalence estimates. This bias is expected to be greater in countries with concentrated epidemics. Our analysis shows that even in countries with concentrated epidemics (for example, India with a survey HIV prevalence estimate of 0.28%), HIV prevalence in the non-household groups needs to be orders of magnitude higher for it to have any significant effect on the national estimate based on the household sample.
In the analysis of the non-response bias, a major limitation is that the estimates are only adjusted to the extent that the sociodemographic and behavioural characteristics included in the analysis are correlated with the risk of HIV infection. Despite including about 30 predictor variables in the regression models, only about 20% of variation in HIV prevalence is explained in most countries, indicating the limitation of such modelling in explaining behavioural health outcomes. Another limitation is that the adjustments for not interviewed, not tested respondents are based on limited information available from the household questionnaire. Future surveys should attempt to collect additional information on this group (mostly absentees) to better assess potential bias due to their exclusion.
Our analysis is based on de facto household-based sample of the national population. A de facto sample assumes that usual residents (de jure household members) who did not spend the previous night in their own household are, on average,  Continued interviewed in a household they may be visiting. A de facto sample maximises participation rates and avoids potential double counting of respondents. HIV seroprevalence estimates based on de facto samples may be biased to the extent some of the de jure household members who slept away may not be visiting another household and to the extent such people have differential HIV prevalence. Furthermore, the adjustments for bias because of nonresponse and exclusion of non-household population groups do not account for a small proportion (usually 1-3%) of sampled households that were not interviewed in the surveys. Finally, the assumptions regarding HIV prevalence and the proportion of adults in the non-household population are arbitrary. However, in India where information on the age structure of non-household population was available from the census, the proportion of adults  in the non-household population was much lower (56%) than the assumed levels of 67% and 75% in the analysis. Moreover, because males tend to have lower prevalence than females and because a great majority of the institutional and homeless population tends to be males, our assumptions of 10 and 20 times greater prevalence among nonhousehold adults seems reasonable.
Our analyses suggest that population-based surveys provide reliable, nationally representative direct estimates of HIV seroprevalence in countries with generalised epidemics. HIV prevalence data from population-based surveys can be useful in understanding the magnitude and spread of the epidemics and in calibrating estimates from sentinel surveillance.