A prediction rule for selective screening of Chlamydia trachomatis infection
- 1Municipal Public Health Service Rotterdam, the Netherlands
- 2STI AIDS (SOA AIDS Nederland) Amsterdam, the Netherlands
- 3Municipal Public Health Service Groningen, the Netherlands
- 4Municipal Public Health Service Eastern South Limburg, the Netherlands
- 5Department of Public Health, Erasmus MC, University Medical Center Rotterdam, the Netherlands
- 6Municipal Public Health Service ‘Hart voor Brabant’, the Netherlands
- Correspondence to: Ms H M Götz Municipal Health Service Rotterdam, Department Infectious Diseases, PO Box 70032, 3000 LP Rotterdam, the Netherlands;
- Accepted 24 June 2004
Background: Screening for Chlamydia trachomatis infections is aimed at the reduction of these infections and subsequent complications. Selective screening may increase the cost effectiveness of a screening programme. Few population based systematic screening programmes have been carried out and attempts to validate selective screening criteria have shown poor performance. This study describes the development of a prediction rule for estimating the risk of chlamydial infection as a basis for selective screening.
Methods: A population based chlamydia screening study was performed in the Netherlands by inviting 21 000 15–29 year old women and men in urban and rural areas for home based urine testing. Multivariable logistic regression was used to identify risk factors for chlamydial infection among 6303 sexually active participants, and the discriminative ability was measured by the area under the receiver operating characteristic curve (AUC). Internal validity was assessed with bootstrap resampling techniques.
Results: The prevalence of C trachomatis (CT) infection was 2.6% (95% CI 2.2 to 3.2) in women and 2.0% (95% CI 1.4 to 2.7) in men. Chlamydial infection was associated with high level of urbanisation, young age, Surinam/Antillian ethnicity, low/intermediate education, multiple lifetime partners, a new contact in the previous two months, no condom use at last sexual contact, and complaints of (post)coital bleeding in women and frequent urination in men. A prediction model with these risk factors showed adequate discriminative ability at internal validation (AUC 0.78).
Conclusion: The prediction rule has the potential to guide individuals in their choice of participation when offered chlamydia screening and is a promising tool for selective CT screening at population level.
- AUC, area under the receiver operating characteristic curve
- AAD, area address density
- MHS, Municipal Public Health Service
- PID, pelvic inflammatory disease
Chlamydia trachomatis (CT) infection is the most prevalent sexually transmitted bacterial infection. It is usually asymptomatic and persistent of nature, and distributed widely in the population, particularly in young people.1 The prevalence of chlamydial infection has increased recently in many countries, including the Netherlands.2–5 In women, chlamydial infections are a major cause of pelvic inflammatory disease (PID), ectopic pregnancy, tubal infertility, and chronic abdominal pain.1 Active case finding and early treatment are crucial strategies to reduce transmission. Systematic screening of women has been shown to reduce the incidence of PID and ectopic pregnancy.6,7 Simple screening strategies (for example, home based) to detect people with an asymptomatic infection has become feasible by improved detection methods of C trachomatis in urine8,9–11 and by the availability of effective single dose treatment. Universal screening is not likely to be cost effective in a population with relatively low chlamydia prevalence. Selective screening, incorporating risk assessment, may increase the cost effectiveness and confronts fewer individuals with an unnecessary test. However, it could lead to an unacceptably high proportion of missed infections. Selective screening criteria for women have been applied in various clinic based, opportunistic chlamydia screening programmes, but their effectiveness has not been evaluated sufficiently.12,13 Selection criteria for both sexes have been studied recently in population based screening programmes, but these have not led to practical guidelines for selection.14,15
The objective of our study was firstly to describe risk factors for chlamydial infection among sexually active responders in a large population based chlamydia screening pilot study, including men and women aged 15–29 years from both urban and rural areas in the Netherlands (see p 17, this issue).16 Secondly, we wanted to identify a combination of risk factors that discriminated adequately between those who are infected and those who are not.
The data of this study were collected in a national probability survey in the Netherlands, which was implemented in four Municipal Public Health Service (MHS) areas and stratified according to area address density (AAD). From September 2002 through March 2003, 12 000 women and 9000 men aged 15–29 years received a package by post with a urine sampling kit and a questionnaire concerning demographic data (sex, age, self assigned ethnicity, education), symptoms, history of STI, and sexual behaviour. Urine analysis was done by nucleic acid amplification test (PCR, Roche, Basel, Switzerland). The method of sampling and screening as well as response rates, non-response, and weighted prevalence among all participants are described elsewhere.16 The present analysis is restricted to those participants who reported sexual activity in the last six months, because risk factors were only available for this group. The Medical Ethics Committee of the Free University Amsterdam approved the study.
Univariate logistic regression analyses were performed, with self reported characteristics as independent variables and diagnosis of C trachomatis as the dependent variable. For the odds ratios, 95% confidence intervals (CI) were calculated. Variables showing an association of p<0.2 were included in the multivariable analysis. Backward stepwise selection was performed with a p value for the likelihood ratio test >0.10 as the criterion for elimination of variables from the model. Interactions between predictors and sex were assessed to study whether effects of predictors were different for men and women. The goodness of fit (reliability) of the model was tested by the Hosmer-Lemeshow statistic. The model’s ability to discriminate between participants with or without a chlamydial infection was quantified by using the area under the receiver operating characteristic curve (AUC). AUC values 0.7–0.8 are considered acceptable, 0.8–0.9 excellent, and >0.9 outstanding.17 Calibration was assessed graphically by plotting observed frequencies of chlamydial infection against predicted probabilities.
The performance of screening criteria in a study population, from which the model is developed, is known often to be too optimistic. The internal validity of the regression model was therefore assessed to estimate the performance of the model in new participants, similar to the population used to develop the model. We used bootstrapping techniques: random samples, with replacement, were taken one hundred times from the study population. At each step predictive models were developed, including variable selection.18–20 Bootstrapping may help to reduce the bias in the estimated regression coefficients, and give an impression of the discriminative ability in similar participants of screening. The outcome is a correction factor for the AUC, and a shrinkage factor to correct for statistical over-optimism in the regression coefficients and to improve calibration of the model in future participants.18,21,22 External validity was assessed by leaving out the four MHS in the sample one by one, and fitting regression models, including variable selection, on the remaining data. The discriminative ability of this model was assessed externally on the MHS data not included in the fitting procedure. This procedure replicates the situation in which the prediction model is applied in another MHS region with a population that may to some extent be different.
For the presence or level of each characteristic in the regression model, a score was calculated, based on the regression coefficients with rounding to simplify the calculation in practice. These scores are an immediate reflection of the logarithm of the odds ratios.23 For each individual these scores were added into a sum score, on the basis of which a regression formula was calculated, taking into account the shrinkage factor derived from the bootstrap procedure. An estimate of the probability for chlamydial infection can be calculated through the regression formula p(Ct) = 1/1+exp (–LPS) where LPS is linear predictor for score. All possible sum scores and their corresponding predicted probabilities of chlamydial infection were combined in a graph with 95% CIs of the predicted probabilities. The confidence interval was calculated, based on a covariance matrix. The average standard error (SE) of the rounded linear predictor values was used to calculate the 95% CIs of the predicted probabilities (1/1+ e –(LPS +/− 1.96×SE)).24
For consecutive cut offs of the sum scores, sensitivity, specificity, fraction positive, and positive predictive values were calculated. Statistical analysis was done with SPSS statistical software version 10.0 (SPSS Inc, Chicago, IL, USA) and with the Design Library for S-plus 2000 (Insightful Inc, Seattle, WA, USA).
Prevalence among sexually active participants
The participation rate was 41% and the prevalence of chlamydial infection among sexually active responders was 2.3% (160/7005).16 Among the 6303 participants who reported being sexually active in the previous six months, 153 tested positive (2.4% (95% CI 2.1 to 2.8)). The prevalence was 2.6% (95% CI 2.2 to 3.2) in women and 2.0% (95% CI 1.4 to 2.7) in men.
Performance of predictive model and development of prediction score
Multivariable logistic regression analysis showed that chlamydial infection was associated with high urbanisation, young age, ethnicity (Surinamese/Antillian), low/intermediate education, multiple lifetime partners, a new contact in the previous two months, no condom use at last sexual contact, and complaints of (post)coital bleeding in women and frequent urination in men (table 1). The only statistically significant interaction term in the model was sex and the number of lifetime partners.
The Hosmer-Lemeshow goodness of fit test had a p value of 0.12, indicating adequate goodness of fit. The model discriminated well between participants who were and were not infected by C trachomatis, with an AUC of 0.81 (95% CI 0.77 to 0.84). Internal validation showed optimism in the AUC of 0.03, resulting in a correction of the AUC from 0.81 to 0.78. In the external validation similar sets of predictors were selected. When tested in each separate MHS, the AUC varied from 0.74 to 0.80. When leaving out the MHS representing mainly AAD 1 and 2, ethnicity did not remain in the model developed from the three other MHS areas. This is related to the finding that the majority of non-Dutch participants in our study population were from this particular MHS area.
Table 2 shows the scores of the prediction rule. The sum score for a 16 year old Surinam woman living in an moderately urbanised area, with intermediate education, three lifetime partners, and a new contact in the previous two months, no postcoital bleeding, and condom use during last intercourse, is 11 (1 + 2 + 2 + 2 + 3 + 1 + 0 + 0). The predicted probability of chlamydial infection for this participant is 11% (95% CI 6 to 20) (fig 1). The discrimination on the basis of the sum score was as good as the discrimination of the original model (AUC 0.80 (0.76–0.84)).
Plots of observed frequency of infection against predicted probabilities showed that calibration of both the model and the score were good for the predicted probabilities up to 10% (see http://www.stijournal.com/supplemental for fig 2).
Application of the prediction rule
The probability of chlamydial infection according to the prediction rule can be used for selection in chlamydia screening. Table 3 shows the results for different cut off levels of sum scores. The first row gives the scenario for performing screening in our whole study population and therefore identifying all patients with a C trachomatis infection (sensitivity 100%). When screening is performed in all sexually active participants with a sum score ⩾8, the number to be screened in our study population would be reduced to 33%. However, 21% of the cases would then be missed (sensitivity 79%). The expected prevalence in the screened group would be 5.7%, in contrast to 2.3% on average. By lowering the cut off from a sum score from ⩾8 to ⩾6, one would have to screen an additional 30% of the population to find 93% of the cases. By doing this, the percentage of unnecessarily screened people in the study population would increase from 32% to 62%.
In this large, population based study demographic, behavioural, clinical, and geographic risk factors in 15–29 year old women and men were identified from which a prediction rule for C trachomatis infection could be developed. This study has led to a promising tool for selective chlamydia screening at population level.
Risk factors identified
Young age predicted chlamydial infection independently, as has been reported by others.25 Surinamese/Antillian ethnicity proved to be a strong predictive factor, confirming previous findings in Amsterdam.15,26 Contrary to other population based studies, we observed low and intermediate education to be predictive for chlamydial infection in both sexes.15,25,27 Ethnicity and level of education as a risk factor may merely reflect risky sexual behaviour. Nevertheless, we assume the independent character of these variables to reflect risks involved in sexual partner choice: in case of unsafe sex, acquisition of a chlamydial infection is related to chlamydia prevalence background rates within particular sexual networks. Area address density, a geographic factor, remained an independent risk factor for chlamydial infection. As expected, people living in very highly urbanised areas (AAD 1) have the highest risk. However, living in less urbanised areas (AAD 2–4) was also associated independently with chlamydia infection. This finding may be important for decision making regarding future screening programmes. Incorporating AAD score points in selective screening decisions takes care of variations in prevalence within and between regions.28 Although symptoms of frequent urination and (post)coital bleeding in the previous four weeks symptoms were relatively infrequent and have probably not led to healthcare seeking behaviour, they predicted chlamydial infection. The number of lifetime partners was a strong independent predictor for chlamydial infection, but with a difference in the strength of association for men and women. Other indicators of sexual behaviour that proved predictive were a new contact in the previous two months, and unsafe sex at last contact. This finding is in line with systematic and opportunistic screening programmes in women.15,25,27,29 Young age at first sex and multiple partners in the previous six months were significant univariable risk factors but did not remain in the model, which can be explained by correlation with lifetime partners.
An important objective of this study was to develop a prediction model, based on risk factors that discriminate adequately between those who are infected with C trachomatis and those who are not. Logistic regression is the most appropriate statistical technique to achieve this goal. Decisions about selection in screening could also be based on a decision tree type model, but in comparative studies the performance of classification and regression trees was not better than classical regression methods.30–32 We therefore preferred logistic regression for our statistical analysis.
In the first instance we had constructed separate models for females and males, but because of low numbers the separate male model was not very robust. Also, most risk factors had very similar effects in both sexes (see http://www.stijournal.com/supplemental for tables 4 and 5). To enhance power, we combined males and females in one model. Interaction between sex and all other determinants for chlamydial infection were tested extensively and the only interaction present was between sex and the number of lifetime partners. This effect was included in the combined model, resulting in different scores for this factor for females and males. The strength of the combined model is illustrated for the variable ethnicity. This variable disappeared in the male model because of a lack of power, causing our separate male model to be awkward to work with in practice. In a combined model, effects in males can be influenced by effects in females, but as the ratio of females to males is approximately 2:1, we consider the balance between the sexes in our combined model to be acceptable.
Performance of screening criteria in a study population is often too optimistic, and is seldom evaluated in another population. This is illustrated by the disappointing performance of selective screening criteria for asymptomatic chlamydial infection in an inner city population33 and in different clinics.12,13,15 Whereas those studies used one part of their data as the development sample and another part to validate their screening criteria, we used bootstrap resampling, which is statistically more efficient.20 Bootstrapping may help to improve the calibration of predictions, and give an impression of the discriminative ability in similar populations. In our test for generalisability (external validation), the model showed acceptable performance for the various MHS regions when using the three other MHS regions for developing the model. The lower AUCs at external validation can be explained to some extent by the sampling method, which was designed to obtain a representative sample for the Netherlands. Not all AAD categories were present in the respective MHS samples. Although our internal and external validation procedures showed satisfactory results in general, further validation is necessary before the prediction rule can be applied reliably in practice. Validation could be done on existing datasets that used similar definition of the predictor variables and for presence of chlamydial infection.
A limitation of our data is that we asked for details of sexual behaviour only in people who had been sexually active in the previous six months—as this had consequences for partner tracing. Therefore, multivariable analysis could only be done for 90% (6303) of all sexually active participants and the derived score can be applied only to those who have been sexually active in the previous six months. The prevalence among those ever sexually active, but not in the previous six months, was 1% (7/681). Assuming no recent partner change and condom use at last contact (both score zero), allowed us to estimate the sum score with the available data. We then predicted chlamydial infection among those ever sexually active (through the formula in table 2). The AUC of the prediction in all ever sexually active participants was 0.80 (0.76–0.83) compared with the AUC of 0.81 (0.77–0.84) in the participants who were sexually active recently. This result provides an argument that in practice the prediction rule can be applied to all sexually active people. Another possible limitation of our study is the fact that the relatively low response rate, especially among men, non-Dutch, and those with intermediate education might affect our results due to selection bias.16
Application of the prediction rule for screening
Our sum score allows for prediction of chlamydial infection in individuals as well as applications for cut off values for decisions in screening programmes at population level. Usually a fixed choice of risk factors is used as selection criterion for screening. Instead, our sum score consists of varying combinations of risk factors, mirroring the probability of infection. Not every person has to fulfil a fixed combination of criteria for screening. The sum score can (potentially) guide individuals in their decision to accept the screening test. As we have shown, the predictive value of the screening criterion based on a selection of a score ⩾8 would be 5.7%. Hence 94.3% of the eligible population screened would not have chlamydia. However, the absolute number of people screened unnecessarily is lower than when screening without selection. The issue of the most efficient cut off level depends on both costs and priorities—either finding most cases or minimising unnecessarily screened people. In population based screening—whether in a specified age group in the whole population or in a restricted geographic area—a prediction rule can be applied to motivate people with a score above a certain level to participate. For instance, an invitation letter for screening could include a simple questionnaire for calculating a personal score, together with a request form for a test kit, or a referral to a website. In opportunistic screening, the clinician can inquire about the predictive criteria.
Risk factors for chlamydia can be used for targeted screening and thus may improve the efficiency of screening in population based programmes.
Regression modelling including a validation process can be used to derive a score, which can be applied at an individual level to determine whether screening should be offered.
In a population based study in the Netherlands, prevalence of C trachomatis was 2.6% in women and 2.0% in men. Predictors for chlamydial infection were high urbanisation, young age, ethnicity, low or intermediate education, multiple lifetime partners, a new contact in the previous two months, no condom use at last sexual contact, and complaints of (post)coital bleeding in women and frequent urination in men.
In the population studied, the prediction score had adequate discriminative ability, but because such a score developed for one population tends to perform less well in other populations, it should be subject to external validation.
In conclusion, this study found demographic, geographic, and behavioural characteristics as well as urogenital symptoms as indicators for chlamydial infections in 15–29 year old women and men in a population based study. Our study indicates that one could consider screening all young women and/or men universally, whether systematic or opportunistic, in regions or settings with high prevalence, or apply the predictive score in regions or settings with lower prevalence. The prediction rule for chlamydial infection opens new avenues for risk assessment in population based screening and possibly in opportunistic screening as well.
G Borsboom (statistician, Department of Public Health, Erasmus MC, Rotterdam) assisted in developing the model. The scientific advisory board consisted of: Professor P J E Bindels (Department of General Practice, Academic Medical Centre, University of Amsterdam), A J P Boeke, PhD (Department of General Practice, VU University Medical Centre, Amsterdam), Professor J D F Habbema (Department of Public Health, Erasmus MC, Rotterdam), J A R van den Hoek, PhD (Municipal Public Health Service Amsterdam), S A Morré, PhD (Laboratory of Immunogenetics, VU University Medical Centre, Amsterdam), and L Jacobi MSc (Groningen).
CONTRIBUTORS HG wrote the first draft and finalised the report. JVB was project leader of PILOT CT. HG, JVB, IV, JB, CH, JR, AC, FDG, DVS, and MV have contributed to the study design and protocol, collected and interpreted data, critically reviewed the draft, and were all involved in the final report. Statistical analysis was performed by IV, HG, JR, and ES.
The PILOT CT study group are: JEAM van Bergen, J Broer, AJJ Coenen, HM Götz, F de Groot, CJPA Hoebe, JH Richardus, DT van Schaik, EW Steyerberg, IK Veldhuijzen, MJC Verhooren.
This research has been financed by a grant from Zorg Onderzoek Nederland, which has no commercial interests and had no role in study design, organisation of the study, and/or writing of the report.
Conflict of interest: none declared