Comparison of STD prevalences in the Mwanza, Rakai, and Masaka trial populations: the role of selection bias and diagnostic errors
- K K Orroth1,
- E L Korenromp2,
- R G White1,
- J Changalucha3,
- S J de Vlas2,
- R H Gray4,
- P Hughes5,
- A Kamali5,
- A Ojwiya5,
- D Serwadda6,
- M J Wawer7,
- R J Hayes1,
- H Grosskurth1
- 1London School of Hygiene and Tropical Medicine, London, UK
- 2Department of Public Health, Erasmus University, Rotterdam, Netherlands
- 3National Institute for Medical Research, Mwanza, Tanzania
- 4Johns Hopkins School of Public Health, Baltimore, USA
- 5MRC Programme on AIDS in Uganda, Entebbe, Uganda
- 6Institute of Public Health, Faculty of Medicine, Makerere University, Kampala, Uganda
- 7Columbia University School of Public Health, New York, USA
- Correspondence to: Kate K Orroth, MPH, Infectious Disease Epidemiology Unit, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK;
- Accepted 2 September 2002
Objectives: To assess bias in estimates of STD prevalence in population based surveys resulting from diagnostic error and selection bias. To evaluate the effects of such biases on STD prevalence estimates from three community randomised trials of STD treatment for HIV prevention in Masaka and Rakai, Uganda and Mwanza, Tanzania.
Methods: Age and sex stratified prevalences of gonorrhoea, chlamydia, syphilis, HSV-2 infection, and trichomoniasis observed at baseline in the three trials were adjusted for sensitivity and specificity of diagnostic tests and for sample selection criteria.
Results: STD prevalences were underestimated in all three populations because of diagnostic errors and selection bias. After adjustment, gonorrhoea prevalence was higher in men and women in Mwanza (2.8% and 2.3%) compared to Rakai (1.1% and 1.9%) and Masaka (0.9% and 1.8%). Chlamydia prevalence was higher in women in Mwanza (13.0%) compared to Rakai (3.2%) and Masaka (1.6%) but similar in men (2.3% in Mwanza, 2.7% in Rakai, and 2.2% in Masaka). Prevalence of trichomoniasis was higher in women in Mwanza compared to women in Rakai (41.9% versus 30.8%). Herpes simplex virus type 2 (HSV-2) seroprevalence and prevalence of serological syphilis (TPHA+/RPR+) were similar in the three populations but the prevalence of high titre syphilis (TPHA+/RPR ≥1:8) in men and women was higher in Mwanza (5.6% and 6.3%) than in Rakai (2.3% and 1.4%) and Masaka (1.2% and 0.7%).
Conclusions: Limited sensitivity of diagnostic and screening tests led to underestimation of STD prevalence in all three trials but especially in Mwanza. Adjusted prevalences of curable STD were higher in Mwanza than in Rakai and Masaka.
Prevalences of sexually transmitted diseases (STD) are often compared between populations in order to gain a better understanding of STD and HIV epidemiology or to determine which control strategy may be most effective in a given epidemiological situation. Three community randomised trials of STD treatment as HIV prevention strategies have been conducted in east Africa in Mwanza, Tanzania, and Masaka and Rakai, Uganda.1–3 In the Mwanza trial, improved STD case management was associated with a 38% reduction in HIV incidence.4 In contrast, in Uganda a trial of information, education, and communication coupled with improved STD case management had no impact on HIV incidence in Masaka district5 and STD mass treatment had no impact on HIV incidence in Rakai district.3 Baseline prevalence of gonorrhoea and chlamydia was about 5% or less in the populations while that for serological syphilis was 10–15%. Various hypotheses have been suggested to explain the apparently contrasting results, including that the Mwanza and Rakai trials differed with respect to the stage of the HIV epidemic, that the proportion of ulcers due to herpes simplex virus type 2 (HSV-2) was higher in Rakai than Mwanza, and that differences in the interventions—that is, continuous treatment of symptomatic STD (as in Mwanza) may have a larger effect than treating all STD periodically (as in Rakai).6 Another potential explanation is that the prevalence of curable STD differed between the three sites. Based on the reduction in HIV incidence in Mwanza and the association between STD and HIV transmission we would expect higher prevalence of STD in Mwanza than in Masaka and Rakai. However, the STD prevalences reported in the three trials cannot be directly compared, as different diagnostic tests and sampling strategies were used.
STD prevalences observed in surveys, such as in the east African trials, depend upon selection of the sample for measurement, extrapolation of the prevalence measured in this sample to the general population and the diagnostic technique used. Selection of the population in which the STD is measured may bias observed STD prevalence either upward or downward. The net effect of diagnostic performance depends on the true prevalence level: if prevalence is low, specificity is most important but if prevalence is high sensitivity is critical. Diagnostic errors can lead to either overestimation or underestimation of prevalence.
The measurement errors in STD prevalences due to screening and test diagnostics were assessed for the Mwanza, Masaka, and Rakai trials. Sampling bias was also considered when comparing measured prevalences in the three populations. We adjusted the observed baseline prevalences of Neisseria gonorrhoeae (NG), Chlamydia trachomatis (CT), Trichomonas vaginalis (TV), syphilis, and HSV-2 infection for these biases, and then compared prevalences between the three populations. We also considered the implications of these comparisons for the interpretation of the discrepant trial outcomes.
Prevalences by 5 year age groups for men and women were adjusted for screening and diagnostic techniques used and then for selection bias, as explained below. After adjustment, the overall prevalence of each infection in adults aged 15–54 years was obtained after standardising for age using the average population structure of the three populations based on census data.7,8 When data were not available for the entire age range 15–54, restricted age ranges were used.
Observed prevalences were adjusted for both the sensitivity and specificity of the diagnostic tests according to equation 19:where, ptrue = true prevalence, pobserved = observed prevalence, Se = sensitivity, and Sp = specificity.
For outcomes where diagnostics were performed on only a subset of the population based on results of a screening test, prevalences were also adjusted for the screening algorithm. The formula for this combined adjustment is:where ptrue = true prevalence, pobserved = observed prevalence assuming those negative on the screening test to be negative, pscreen = prevalence of a positive screening test, Se = sensitivity of the diagnostic test, Sp = specificity of the diagnostic test, and Se′ = sensitivity of the screening test. The derivation of equation 2 is given in the appendix.
Estimates of sensitivity and specificity for diagnostic tests
The sensitivity and specificity of the tests used were estimated based on values documented in the literature for populations from all geographical locations. Initially eligible studies were limited to those conducted in asymptomatic populations. However, for some of the tests estimates were only available for symptomatic populations so this criterion was dropped. When multiple studies were available we averaged test sensitivities and specificities across published studies, and the mean values were used for adjustment of the trial observations. To consider uncertainty in diagnostic test performance we used the highest documented value for sensitivity from any single study combined with the lowest value for specificity from any single study, to estimate a lower limit for the adjusted prevalence. Similarly, the upper limit was determined by using the lowest sensitivity and highest specificity.
To obtain estimates of sensitivity and specificity, we included only published studies in which the techniques used in the trials were directly compared in the same population using the same gold standard. These studies typically used an “expanded” gold standard in which discrepant results were resolved using another confirmatory test. Studies using culture alone as the gold standard were not used because the imperfect sensitivity of culture would incorrectly suggest a low specificity for the tests under study which have higher sensitivity than culture, such as LCR.10
Gonorrhoea and chlamydia
In Rakai and Masaka, ligase chain reaction (LCR, Abbott Laboratories, Abbot Park, IL, USA) on urine was used to diagnose NG and CT in men and women. In Mwanza, Gram stain on urethral smears was used to diagnose NG in men. For women, culture from endocervical swabs were used to diagnose NG. In Mwanza, enzyme immunoassay (EIA, IDEIA Chlamydia; Novo Nordisk Diagnostika, Cambridge, UK) from urethral swabs was used to diagnose CT in men, and EIA from endocervical swabs was used for CT in women.
In Mwanza, a screening test was used to select the sample of men to be tested for NG and CT with testing confined to men who either tested positive on urine leucocyte esterase dipstick (LED; Nephur-Test + Leuco, Boehringer-Mannheim, Lewes, UK) tests, who complained of discharge during the interview, or who had discharge upon clinical examination. To allow adjustment for the performance of the screening test (according to equation 2), we estimated its sensitivity based on findings in a separate study conducted in 1996 in a rural community in Mwanza Region11 (see below).
The prevalence of Trichomonas vaginalis in women was assessed via InPouch TV culture (BioMed Diagnostics, San Jose, CA, USA) on self collected vaginal swabs in Rakai and by wet mount microscopy of vaginal smears in Mwanza. TV prevalence was not available for men in the three trials or from women in Masaka. In the reviewed literature, studies were included which directly compared the performance of InPouch TV culture to wet mount microscopy with either another culture method or polymerase chain reaction (PCR) to resolve discrepant results.
A non-treponemal test, the toluidine red unheated serum test (TRUST; New Horizons, Columbia, MD, USA), was used to screen the Rakai population for syphilis. Those testing positive on the TRUST test were further tested using the Treponema pallidum haemoagglutination assay (TPHA; TPHA Sera-Tek, Fujirebio, Tokyo, Japan) test.12 In Masaka and Mwanza, the TPHA test was conducted on all study participants. A non-treponemal rapid plasma reagin (RPR; VD-25 Murex, Dartford, UK) test was conducted for those who were TPHA positive.13 We compared the prevalence of serological syphilis (TPHA positive with any RPR/TRUST titre) and the prevalence of active, high titre syphilis (TPHA positive with RPR titre ≥1:8) in all three populations. The diagnostic performance of the RPR and TRUST tests are similar with 98% sensitivity and 99% specificity14,15 and TPHA tests were used in both populations so no adjustment was needed to validly compare serological or active high titre syphilis prevalence as measured in the three sites.
HSV-2 serology for the Rakai trial was conducted at the Centers for Disease Control (CDC) using an immunoblot assay which discriminates between antibodies for HSV-1 and HSV-2.3 The diagnostic adjustments proposed for the Rakai data are based on comparison with another western blot assay as the gold standard.16 Seroprevalence in Masaka and Mwanza was measured using a monoclonal antibody blocking immunoassay test and the diagnostic adjustment is based on the comparison of this test with the aforementioned western blot in a rural African population.17
Estimation of selection bias
If random samples of the trial population were selected or the entire trial population was evaluated, we adjusted STD prevalences for diagnostic test performance only. This was the case for NG, CT, TV, and syphilis in Rakai; for NG, CT, HSV-2 and syphilis in Masaka; and for HSV-2 and syphilis in Mwanza.2,12,13,18 We restricted comparison of STD prevalences to the sex and age ranges for which data were available from all sites. Thus, comparisons for NG and CT were restricted to women and men aged 15–39 years. Comparisons for TV included women aged 15–49 years. We compared HSV-2 seroprevalence in 15–29 year olds and syphilis in 15–54 year olds.
Where only non-random samples of the trial populations had been measured, additional adjustments were considered. In women in Mwanza, prevalences of NG, CT, and TV were measured only among antenatal clinic (ANC) attendees. ANC data from Mwanza town have been shown to underestimate HIV prevalence and syphilis prevalence in women in the general population.19,20 Similarly, unpublished analysis of Rakai trial data found age standardised NG prevalence among 15–39 year olds to be 0.7% among self reported pregnant women and 1.6% among women in the general population. For CT in Rakai, the corresponding prevalences were 2.9% and 2.7% and for TV, prevalences were 24% in both groups (Eline Korenromp, personal communication). These data suggest levels of curable STD are similar or slightly lower in pregnant women compared to the general population in some African populations. However, it is unclear if this can be assumed to apply to women attending antenatal clinics in rural Mwanza. It may be that selection biases associated with antenatal populations differ between urban and rural areas, or that HIV associated effects on fertility may lead to different biases in rural Mwanza (4% HIV prevalence) and Rakai (16% HIV prevalence). Therefore, the Mwanza data were not adjusted for selection of the ANC sample.
HSV-2 seroprevalence in Rakai was measured only among individuals reporting they were sexually active. Hence the population adjustment must take into account the proportion sexually active in the different age strata in the sampled group (15–29 year olds). Diagnostic adjustments were applied to the observed prevalence in each age group. Then this adjusted prevalence was multiplied by the proportion sexually active for each age group and finally the prevalence was age standardised.
A sensitivity analysis was conducted to ascertain the relative importance of the different adjustments performed to validly compare STD prevalence across the sites. We used male NG prevalence and female HSV-2 prevalence as examples for the sensitivity analysis. To determine the relative importance of the sensitivity and specificity of diagnostic tests we varied each alone and then the combination. The sensitivity and prevalence of a positive screening test were varied one at a time, together and in combination with diagnostic test performance. For screening test performance, we used the values observed for NG in men during the Misungwi study and lower and upper limits. The lower bound for the sensitivity of the screening test was 44% (half that of the default value) and the upper bound was 95%. To assess the importance of selection bias we varied the proportion sexually active in the Rakai HSV-2 data. The proportion reporting sexual activity was used as the base case adjustment. To estimate an upper limit for HSV-2 prevalence, we assumed under-reporting of sexual activity was 50% (of those who denied sexual activity, half were truly sexually active) among 15–19 year olds, 5% among 20–24 year olds, and 0% in 25–29 year olds.
Table 1 shows published estimates of the sensitivity and specificity of the diagnostic tests. Where possible, the average sensitivity and specificity for each test are also given. For NG in men, only one study directly comparing the diagnostic properties of LCR on urine to Gram stain of urethral smears was identified for an STD clinic population and so we used data from this study.21 Similarly, only one study was available to determine the diagnostic performance of each of the HSV-2 serological tests. For the tests used in Mwanza, sensitivity was generally low, mean estimates ranging between 49% and 91%. For the Masaka tests, mean sensitivities ranged between 84% and 91%. For the Rakai tests, mean sensitivities ranged between 79% and 92%. Mean specificities were between 93% and 100% for Mwanza and Masaka and were 100% for all tests used in Rakai.
Table 2 lists all adjustments required for the STD prevalence comparison. For diagnostic tests, these were based on the means and ranges given in table 1. For the screening test used for NG and CT testing in men in Mwanza, we estimated sensitivity at 53% for CT and 88% for NG; prevalence of a positive screening test, needed for the adjustment according to equation 2 was about 25% for all age groups. For selection bias adjustment of the Rakai HSV-2 data, measured proportions sexually active for each age group are listed.
Table 3 shows observed prevalences with 95% confidence intervals and adjusted prevalences of CT, NG, TV, and HSV-2 serology. For serological (TPHA+/RPR+) and active high titre (TPHA+/RPR≥1:8) syphilis only the directly observed prevalences are given as adjustment was not necessary. In general, similar sample sizes were used in the three studies so sampling errors of the observed prevalences were similar across the three trials.
In women, observed prevalences of CT and NG were higher in Mwanza than in Rakai and Masaka. After adjustment for diagnostics these differences were accentuated. Observed and adjusted prevalence of CT was higher in Rakai than in Masaka but prevalence of NG was similar in the two Ugandan sites. Observed TV prevalence was similar in Mwanza and Rakai, but after adjustment, prevalence was higher in Mwanza compared to Rakai. Observed prevalence of serological syphilis was higher in Rakai and Masaka than in Mwanza but prevalence of active high titre syphilis was much higher in Mwanza than in Rakai and Masaka. Also for HSV-2, seroprevalence was slightly lower in the Ugandan sites compared to Mwanza even after allowing for possible under-reporting of sexual activity in Rakai.
In men, observed CT prevalence was higher in Rakai and Masaka than in Mwanza but after adjustment there was little difference between sites. Observed NG prevalence was higher in Mwanza than the Ugandan sites and the difference was accentuated after adjustment. Similar to results for women, the observed prevalence of serological syphilis was higher in Masaka and Rakai than in Mwanza but prevalence of active high titre syphilis was much higher in Mwanza than in Masaka and Rakai. Levels of HSV-2 were higher in Rakai and Masaka compared to Mwanza.
The reported ranges indicate that the outcomes were fairly robust against uncertainty in our estimates of sensitivity and specificity of the diagnostic tests. Exceptions were CT and TV in females in Mwanza, for which estimated prevalences changed considerably when replacing the mean estimates for sensitivity by values documented from single studies (table 1). The sensitivity analysis shows limited sensitivity in the diagnostic tests in all sites led to underestimation of STD prevalence during the trials while specificity, in contrast, did not markedly influence apparent STD rates (table 4). An exception was HSV-2 infection for which the effects of sensitivity and specificity in Mwanza and Masaka almost cancelled out. The underestimation was accentuated when a screening test with limited sensitivity was used as was the case when measuring NG and CT among men in Mwanza. The selection of those reporting sexual activity for measuring HSV-2 seroprevalence in Rakai did not appreciably bias the apparent prevalence.
Previously published results from the Mwanza, Masaka, and Rakai trials suggested STD prevalences to be comparable in the three sites at baseline.2,12,13 We have shown that, because of diagnostic errors and selection bias, the levels of curable STD may have been underestimated. This was especially the case in Mwanza, and taking these factors into account, prevalences of curable STD proved to be considerably higher in Mwanza than in Masaka and Rakai for NG, CT (apart from CT in men), TV, and active, high titre syphilis. These infections of short duration reflect current sexual behaviour in the populations. On the other hand, the prevalences of HSV-2 and serological syphilis tended to be similar or higher in the Ugandan sites compared to Mwanza but these STD markers reflect long term sexual behaviour in the population. The higher prevalence of short duration STD in Mwanza compared to the Ugandan sites may be the result of higher risk sexual behaviour in Mwanza relative to that in Masaka and Rakai at the baseline of the trials.22 The similar or higher prevalence of long duration STD markers (HSV-2 and serological syphilis) in the Ugandan sites compared to Mwanza may be explained by risk behaviour having been higher there in the past compared to that measured in Mwanza.23–26 This is supported by the higher prevalence of HIV in Rakai (16%) and Masaka (12%) compared to Mwanza (4%). The outbreak of civil unrest in Uganda and subsequent stabilisation may help explain the observed patterns of HIV prevalence and sexual behaviour in the Ugandan sites at the baseline of the trials. The natural dynamics of HIV epidemics may also have a role.
Several limitations in our analysis must be highlighted. Regarding selection bias, we must consider selection of the ANC sample in Mwanza compared to women in Rakai and Masaka when comparing NG, CT, and TV prevalences. Our results indicated higher STD prevalence among ANC women in Mwanza compared to all women in Rakai and Masaka. About 90% of pregnant women in Mwanza attend ANC so we would not expect selective factors such as STD symptoms to bias STD prevalence in the ANC sample upward.27 As previously discussed it seems most likely the ANC sample in Mwanza would have underestimated STD prevalence in the general population.19,20 Had we made adjustments for this the STD prevalence in women in Mwanza would have been even higher, further accentuating the difference between sites. Hence, our finding of higher STD prevalence among women in Mwanza was most probably not affected by our decision to ignore selection bias in this sample.
When comparing NG and CT prevalence in men we have to consider limitations of the performance of the complicated screening algorithm used in Mwanza. We inferred the performance of the screening algorithm (urine LED tests and reporting of symptoms or signs on clinical examination) from its use in a general population survey among men in a rural community of Mwanza region, Misungwi, in 1996.11 In Misungwi, the performance of the screening algorithm was compared to LCR on urine for CT and culture on urethral swabs for NG. This inference was appropriate because the proportion of men positive for the LED test, reporting symptoms, or with signs on clinical examination were known to be similar between the Misungwi study and the Mwanza trial. In the trial, the distribution of reasons for positivity on the screening test was 92% (out of 1451 men) LED positive only, 4% complaining of discharge or ulcerative symptoms or had signs on clinical examination but LED negative, and 4% with symptoms or signs and LED positive.28 In the Misungwi study of 438 men, 96% who tested positive on the screening algorithm had a positive LED test only, 2% had signs or symptoms only, and 2% had symptoms or signs and were LED positive. Furthermore, if the sensitivity of the screening algorithm differed between the two populations, our adjusted prevalence estimates for the trial population would be biased but this is probably not the case since the determinants of sensitivity, such as the prevalence of schistosomiasis, are similar in Misungwi and the Mwanza trial communities.
Finally, limitations in the performance of the diagnostic tests used influenced the measured prevalence in the two trials. Owing to the earlier date of the Mwanza trial, it was not feasible to use highly sensitive tests which were not available or affordable at that time. This explains why diagnostic biases were larger in Mwanza than in Masaka and Rakai. In this analysis, low sensitivity had a more critical role than low specificity and most adjustments of prevalences were upward (table 3) since the specificities were high (table 1). For NG and CT in men in Mwanza, the effect of low sensitivity of the diagnostic test was enhanced by the low sensitivity of the screening test. The true magnitude of these diagnostic biases are however not certain, since we used estimates of test sensitivity and specificity mostly from laboratory evaluations in Western settings (table 1), where sensitivity and/or specificity may be better than under conditions in the rural African sites. It is unclear how much diagnostic test performance may have declined in the trial settings. In addition, the majority of estimates for sensitivity and specificity of diagnostic tests came from symptomatic (STD clinic) populations. We might, however, expect the sensitivity of the tests to be worse for asymptomatic individuals compared to symptomatic individuals and this would lead to underestimation of STD prevalence in all sites. Finally, the gold standards used to measure sensitivity and specificity may be imperfect which would limit our ability to estimate true absolute prevalence. However, the same gold standards were used for the different tests so our site comparison of the relative prevalences in the three populations should be valid even if the absolute prevalences are different owing to an imperfect gold standard.
This illustration of the importance of sample selection and diagnostic techniques in biasing observed STD prevalences has implications for the conduct and interpretation of randomised trials and other surveys with STD prevalence outcomes. The complexity of the adjustments made for the screening algorithm in Mwanza indicates it is preferable wherever possible to use random subsamples of the general population to monitor STD prevalence.
With respect to diagnostic biases in intervention trials, not only their effects on baseline prevalence but also the observed impact of the intervention on STD prevalences must be considered. Limitations in accuracy of diagnostic tests will in general result in dilution of observed impact on prevalence (that is, differences between arms at follow up or reductions over time in one arm). Limited specificity is generally of more concern in this respect than limited sensitivity. As an example, assume the observed impact of an intervention on CT was to reduce prevalence from 4% to 2% (relative risk = 0.5, absolute difference = −2%). With 100% sensitivity and 99% specificity, the true prevalence—after adjustment for imperfect specificity—would be 3% in the comparison arm and 1% in the intervention arm, consistent with a true relative risk of 0.33. If sensitivity is limited (90%) but specificity is 100%, the relative difference in the true prevalence between arms or time points is the same as the relative difference in observed prevalence (0.5) but the absolute difference is larger, −2.2%. If both are limited by these assumed amounts, the true relative risk is 0.33 and the true absolute difference is −2.25%.
In conclusion, the biases in STD prevalences in the general population cohorts of the three trials, and particularly in Mwanza, illustrate the critical attention which should be paid to selecting the population in which the STD is to be measured and the diagnostic technique to be used when conducting a research study with STD prevalence outcomes. Notably, besides the properties of diagnostic tests, it is preferable to screen whole populations or random sample, and to avoid screening algorithms based on symptomatology which omit asymptomatic infections. In intervention trials, high specificity is critical to avoid dilution of impact measures. Our adjustment methods suggest that, after taking into account diagnostic error and selection bias, the prevalences of curable STD were higher in Mwanza than in Masaka and Rakai at the baseline of the trials. Based on this, we might expect STD treatment to have a larger impact on HIV incidence in Mwanza than in the Ugandan sites. This finding may, in part, explain why STD treatment was associated with lower HIV incidence in Mwanza but not in the Ugandan sites.
Derivation of adjustment for diagnostics and screening, equation 2
Se = sensitivity of diagnostic test, Sp = specificity of diagnostic test, Se′ = sensitivity of screening test, Sp′ = specificity of screening test.
Then expected values of cell frequencies are as follows:
This yields equation 2 in Methods:
This work was supported by the UK Department for International Development as part of a project to perform a comparative analysis of the cost effectiveness of intervention strategies for the prevention of HIV transmission. We would like to thank Dik Habbema for his helpful comments on the manuscript. The authors also thank Glaxo-SmithKline for financial support.
This study was designed, implemented, and analysed by KO, EK, and RW. It is part of a larger project which was planned, designed, implemented, and supervised by HG and RH and involved close collaboration with RG, AK, DS, SV, and MW. The collection, analysis, and use of trial data was conducted by PH, AK and AO for the Masaka data, RG, DS, and MW for the Rakai data and HG, RH, and JC for the Mwanza data. The manuscript was prepared by KO, EK, HG, and RH with contributions from all other authors.