Article Text

Original article
A validation study of a clinical prediction rule for screening asymptomatic chlamydia and gonorrhoea infections among heterosexuals in British Columbia
  1. Titilola Falasinnu1,
  2. Mark Gilbert2,
  3. Paul Gustafson3,
  4. Jean Shoveller1
  1. 1The School of Population and Public Health, University of British Columbia, Vancouver, British Columbia, Canada
  2. 2British Columbia Center for Disease Control, Vancouver, British Columbia, Canada
  3. 3The Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada
  1. Correspondence to Dr Titilola Falasinnu, The School of Population and Public Health, University of British Columbia, 2206 East Mall, Vancouver, British Columbia, Canada BC V6T 1Z3; lola.falasinnu{at}


Background One component of effective sexually transmitted infections (STIs) control is ensuring those at highest risk of STIs have access to clinical services because terminating transmission in this group will prevent most future cases. Here, we describe the results of a validation study of a clinical prediction rule for identifying individuals at increased risk for chlamydia and gonorrhoea infection derived in Vancouver, British Columbia (BC), against a population of asymptomatic patients attending sexual health clinics in other geographical settings in BC.

Methods We examined electronic records (2000–2012) from clinic visits at seven sexual health clinics in geographical locations outside Vancouver. The model's calibration and discrimination were examined by the area under the receiver operating characteristic curve (AUC) and the Hosmer–Lemeshow (H-L) statistic, respectively. We also examined the sensitivity and proportion of patients that would need to be screened at different cut-offs of the risk score.

Results The prevalence of infection was 5.3% (n=10 425) in the geographical validation population. The prediction rule showed good performance in this population (AUC, 0.69; H-L p=0.26). Possible risk scores ranged from −2 to 27. We identified a risk score cut-off point of ≥8 that detected cases with a sensitivity of 86% by screening 63% of the geographical validation population.

Conclusions The prediction rule showed good generalisability in STI clinics outside of Vancouver with improved discriminative performance compared with temporal validation. The prediction rule has the potential for augmenting triaging services in STI clinics and enhancing targeted testing in population-based screening programmes.


Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


The imperative to provide more efficient sexual health services by public health programmes has led to the development of service models that optimise the use of health human resources, such as internet-based sexually transmitted infections (STIs) testing and triage services.1–4 The aim of optimising service provision could be facilitated by the use of risk estimation algorithms. In a previous paper,5 a risk estimation algorithm for optimising asymptomatic chlamydia and gonorrhoea case finding was derived using electronic medical records of patient visits at two sexual health clinics in Vancouver, British Columbia (BC). This algorithm combines five risk factors: younger age, non-white race/ethnicity, multiple sexual partners, previous chlamydia diagnosis and previous gonorrhoea diagnosis. The prediction rule will eventually be adapted into a tool for facilitating selective screening in Get Checked Online (GCO), a novel internet-based testing programme in BC.4 ,6

In the derivation stage, we specified that the main intended application of the risk score is to help deal with the increasing numbers of people accessing services in the sexual health clinic contexts, particularly the asymptomatic individuals. However, a prediction rule's accuracy in one context may vary from the performance estimates reported in the derivation stage.7 ,8 Inconsistent performance may reflect artifactual disparities (eg, different study contexts), potentially in combination with genuine disparities (eg, distribution of risk factors).9 Thus, it is important to consider the prediction rule's accuracy in varied settings, such as general practice clinics or hospitals, different types of hospitals or the same type of clinical setting in different geographical locations.9

Before making recommendations regarding the widespread use of the risk estimation algorithm derived through our previous work,5 we conducted an external validation study in an independent setting to further test the parameters of the tool. It is essential to confirm that the algorithm is generalisable to a plausibly related setting (in addition to the previous comparisons conducted with the derivation population) that reflects the level of heterogeneity that will be encountered in real-life applications of the algorithm.10 As was demonstrated previously, the algorithm showed reasonable discrimination and calibration upon validation in two different time periods (ie, temporal validation).11 While temporal validation is often cited as the first step in demonstrating the transferability of a prediction algorithm,12 it cannot assess the utility of the algorithm to other clinics or cities.10 Geographical validation provides a more rigorous proof of validation than temporal validation owing to the hypothesised differences in patient mix, risk factor definitions and disease prevalence.10

It should be appreciated that a prediction rule's performance is often lower upon external validation.13 In these settings, decision-makers have the option to adopt a previously derived prediction rule if it is found to perform adequately or they can derive or remodel a new prediction rule using their own population. Here, we describe the results of a test to assess the generalisability of an algorithm derived in Vancouver against a population of patients attending sexual health clinics in seven other geographical settings in BC. In addition, we examine the implications of deriving a new prediction rule using the data from the geographical validation population (ie, remodelled prediction rule) and compare its performance to the ‘Vancouver’ prediction rule.


The geographical validation dataset was derived from electronic medical records of clients attending publicly funded sexual health clinics located in seven locations in BC between 2000 and 2012: Penticton, Kelowna, Kamloops, New Westminster, Boundary, Courtenay and Prince George (see map in online supplementary figure 1). This analysis was limited to clinic visits among asymptomatic women and heterosexual men who are not sexual contacts of STI cases and not receiving confirmatory positive testing. The aim was to estimate the risk of chlamydia and/or gonorrhoea infection. In the current paper, the original model, regression coefficients and the simplified risk scores derived from the Vancouver clinic data are applied to a geographical validation population.10 A detailed description of the study protocol, analysis plan and predictor definitions has been previously published.14

The ‘Vancouver’ risk estimation algorithm uses a logistic regression formula to relate its five predictors to chlamydia or gonorrhoea risk. The regression coefficients and their associated scoring points are listed in table 1. Multiple imputation methods (five rounds) were used to impute missing values in the geographical validation population.15–17 This analysis imputed missing values using IVEware, a software application that performs multiple imputations of missing values using the Sequential Regression Imputation Method.17 All predictors and the outcome variables were included in the imputation model and the results of the five imputed datasets were combined to obtain final estimates.15 ,16 To assess the performance of the model in the geographical validation population, the model's discrimination was estimated by calculating the area under the receiver operating characteristic curve (AUC). The AUC gives the likelihood that a randomly selected infected individual would have a higher model predicted probability of chlamydia or gonorrhoea infection than a randomly selected non-infected individual.18 The closer an AUC is to 100%, the better the model.18

Table 1

Prediction rule for quantifying the probability of asymptomatic chlamydia and/or gonorrhoea infection among heterosexuals in Vancouver, British Columbia

Calibration was assessed with the Hosmer–Lemeshow goodness-of-fit statistic, which investigates (under the null hypothesis that there is no difference) the difference between the model predictions and the actual observations using deciles of predicted probabilities to categorise patients.18 A p value >0.05 indicates a good fit.18 The model's calibration was also examined by graphically plotting the prevalence of chlamydia and/or gonorrhoea infection in groups of the simplified risk scores. To aid population-based screening decision-making, a risk score was derived for each clinic visit in the geographical validation population by adding up the scoring points derived from table 1. An evaluation of the sensitivity (or the fraction of infected cases identified) and the proportion of the population that would be screened at different risk score cut points was also performed. A well-performing screening tool detects >90% of cases, while screening <60% of the population.19


During the years 2000–2012, there were 10 425 patient visits that met the inclusion criteria at sexual health clinics at the following geographical sites: Penticton, Kelowna, Kamloops, New Westminster, Boundary, Courtenay and Prince George. Online supplementary figure 2 is a flow chart showing the selection of clinic visits whose data comprised this validation study. The prevalence of chlamydia and/or gonorrhoea infection was 5.3% (higher than the derivation population). Table 2 shows the distribution of the baseline characteristics of patient visits. The derivation population (Vancouver) is included for comparison. The majority of patient visits in the geographical validation population had the following demographic characteristics: male gender (57.5%), aged between 20 and 24 years (28.0%), and white ethnicity (74.3%). More than two-thirds of patient visits reported having 1–2 sexual partners in the previous six months and approximately 43% reported consistent condom use. Approximately 3% of patient visits reported injection drug use and the same proportion reported having sex with partners recruited online. Previous chlamydia diagnosis was reported among nearly 16% of patient visits.

Table 2

Population characteristics of heterosexual visits at sexual health clinics in the derivation and geographical validation populations

The geographical validation population differed from the derivation population by having a higher proportion of the following characteristics: women, younger individuals, inconsistent condom use and injection drug use (table 2). There were also some differences between the derivation and geographical validation populations in terms of the unadjusted ORs examining the associations between the predictors and the outcome. Gender and condom use were significantly associated with infection in the geographical validation population—associations that were not significant in the derivation population (table 3). Race/ethnicity was not significantly associated with the outcome in the geographical validation population unlike the derivation population (table 3).

Table 3

Chlamydia and/or gonorrhoea prevalence and unadjusted ORs (derivation and geographic validation populations)*

The Vancouver risk model demonstrated good discrimination in the geographical validation population. The AUC in the geographical validation population was 0.69, 95% CI 0.67 to 0.71, while the AUC in the derivation population was 0.74, 95% CI 0.70 to 0.77 (online supplementary figure 3). A p value of 0.26 for the Hosmer–Lemeshow goodness-of-fit test also indicated good calibration. Online supplementary figure 4 shows the calibration in the geographic validation population was good as the prevalence of chlamydia and/or gonorrhoea infection increased with increasing risk score, which ranged from 0.2% in the lowest risk score category to 23.7% in the highest risk category. We also explored the use of the risk score for selective screening (table 4). This analysis identified a risk score cut-off level of ≥6 points that would identify approximately 95% of infections while screening 78% of the geographical validation population. In the derivation population, the same risk score cut-off of ≥6 points identified 91% of cases and the fraction screened was 68% of the population.

Table 4

Sensitivity and specificity of cut-off scores in the derivation and geographical validation populations


Validation studies aim to provide evidence that a risk scoring algorithm can be generalised to new populations. The ‘Vancouver’ risk estimation tool showed slightly better discrimination in the geographical validation population (AUC=0.69) than in the temporal validation population (AUC=0.64). The risk estimation tool performed well in the geographical validation population despite the fact that the geographical validation population differed from the derivation population regarding some predictors (eg, age, condom use and previous infection). The geographic validation regions have higher rates of chlamydia and gonorrhoea infection and dissimilar STI epidemiology and social determinants of sexual health;21–23 the temporal validation population also was less heterogeneous than the derivation and geographical validation populations. Further analysis also revealed that a remodelled algorithm using data from the geographical validation population performed no better than the ‘Vancouver’ prediction algorithm as indicated by the non-significant χ2 statistic testing the difference between the AUCs. These findings provide strong evidence that the risk score is robust and valid and likely has generalisable discrimination and calibration in varied settings.

In the geographical validation population, choosing the ‘Vancouver’ cut-off point of ≥6 would require screening 78% of the population to find 95% of the cases and equates to a reduction of 22% in the number of individuals that would need to be screened. However, using the ≥6 cut-off point would fail to meet the efficiency benchmark of screening <60% of the population. Increasing the cut-off point to ≥7 would require screening 67% of the population to achieve a sensitivity of 90%, which would be closer to the efficiency benchmark. In applying the prediction rule to a population with a higher prevalence of infection and risk behaviours (ie, more severe case mix) such as the geographical validation population, it is expected that choosing a higher cut-off point will increase the efficiency of screening decision-making.24 ,25

Alternatively in this setting, applying the age-based screening criterion (ie, age <25 years) to the geographical validation population would require screening of 45% of the population but would only detect 71% of the cases, a performance that falls short of the screening benchmark. However, increasing the cut-off to age <30 years would require screening 64% of the population with a sensitivity of 88%, a performance that is close to the benchmark and also similar to using the cut-off point of ≥7. This finding suggests that using age alone could be a viable option as a screening criterion in the geographical validation population, but not in the derivation population. This finding was not surprising because the distribution of age in the geographical validation population was more heterogeneous than in the derivation population—a situation that often leads to better discrimination and optimum screening performance.25 However, age-based screening criterion may be contraindicated in settings where a majority of the population presenting for screening is comprised of younger individuals (eg, youth clinics). Also, if the prediction rule is found to perform less than adequately in these settings, universal screening may be a more suitable alternative.

Several studies have established the validity of prediction rules as screening criteria, especially where chlamydia and/or gonorrhoea prevalence is low (ie, <2%), as is the case in the derivation population used in this study.26 Although it has been suggested that universal screening or using criteria based on age would be cost effective in settings with prevalence of 2% or more, publicly funded sexual health services in these settings are constrained by available funding and limited resources. In considering such practical constraints within the BC, we suggest a cautious approach to such global screening approaches.26 Specifically, screening women <25 years old could prove to be cost prohibitive in settings where individuals in this age group comprise the highest proportion of clinic visits.27 Caution should also be exercised before using age as a criterion in internet-based testing scenarios such as GCO where good calibration (and not just discrimination) would be essential. In these scenarios, the risk score categories and their associated prevalence prove more useful than age-based criterion for patients trying to decide whether to take the STI test. This is because the process of moving from screening criteria that focus on specific risk factors (eg, age) to prediction rules acknowledges a more comprehensive risk spectrum.28

The findings of this analysis also were compared with other external validation studies that examined the validity of previously derived clinical prediction rules (CPRs) in new geographic settings.24 ,29 Gotz and colleagues derived a prediction rule for chlamydia infection for the selective screening of high-risk individuals in Rotterdam, the Netherlands.24 The prediction rule showed fair external validity in two independent settings: a population-based study in Amsterdam and an outreach screening project among high-risk youth in Rotterdam. The AUC was 0.79 (95% CI 0.76 to 0.84) in the derivation sample, 0.66 (95% CI 0.58 to 0.74) in the Amsterdam sample and 0.68 (95% CI 0.58 to 0.79) in the Rotterdam sample.24 A second study by Haukoos et al29 derived and validated an algorithm to accurately identify patients at risk for HIV infection, using patient data from an STI clinic in Denver, Colorado (1996–2008). Validation was performed using an independent population from an urban emergency department in Cincinnati, Ohio. The results of the study showed that the risk score showed reasonable generalisability; the AUC was 0.85 (95% CI 0.83 to 0.88) in the derivation sample and 0.75 (95% CI 0.70 to 0.78) in the validation sample.29

The AUC of the ‘Vancouver’ risk estimation in the derivation population was lower compared with the AUCs of the two aforementioned studies.24 ,29 This can be explained by the omission of symptoms in the ‘Vancouver’ risk estimation tool, which have been shown to be significantly associated with infection.30 Specifically, unlike previous risk estimation tools in sexual health settings, the ‘Vancouver’ risk estimation tool was limited to asymptomatic patients, an important improvement as most STIs infrequently present with symptoms. The loss in discriminative ability between the derivation and validation populations in the other studies ranged (in absolute percentage points) from 10% points to 13% points compared with a loss of 10% points and 5% points in the temporal and geographical validation populations, respectively, in this analysis.

There were several strengths to the geographic validation process undertaken here, including the large overall population size, the independence of the clinicians in the geographical validation population from the derivation population, and the systematic analysis of its discrimination and calibration. The current study was the first to derive and validate a locally specific risk assessment tool to quantify STI risk in a Canadian setting. Risk assessment tools ideally should be derived from large representative samples.31 This study included 13 years of electronic health records comprising 40 000 patient visits to publicly funded STI clinics in BC, representing a high percentage of the population of individuals using this service in the province. As with most administrative datasets, the dataset was not deliberately built for the derivation of risk algorithms, resulting in some missing information for several predictors, which we have attempted to mitigate through the use of imputation (something rarely done in prediction modelling studies). The use of imputation techniques yielded discrimination and calibration performance measures similar to those of complete case analyses in which individuals with missing values on any of the considered variables were excluded and baseline analyses in which individuals with missing values on a variable were assumed to be in the lowest category (data not shown).15 This finding suggests that the algorithm was valid despite the consequential risk factor misclassification associated with the data imputation process. Overall, however, the use of imputation techniques offers improved study power and limited bias in the estimated regression coefficients.15

Caution should be exercised in generalising the findings of this analysis to even more diverse geographic settings. Several additional analyses are recommended before the widespread implementation of the risk estimation algorithm. For example, the algorithm's screening performance could be prospectively verified in internet-based STI testing contexts. Furthermore, while the derivation and validation populations are an unbiased representation of STI clinic clients in BC, the current results might or might not be valid for other settings in BC (or other Canadian provinces, or even other global settings). It would be reasonable to argue that STI clinic clients also may vary significantly from patients seeking care in primary care settings or online contexts; and, therefore, the results of the current CPR should not be directly extrapolated to other settings without additional validation studies that could provide stronger evidence for the generalisability of the risk estimation algorithm.

In conclusion, a new era in evidence-based decision-making regarding STI testing and progress in relation to the adoption of prediction rules may be at hand. The advent of online approaches to risk estimation, the emergence of new statistical methods, as well as increasingly sophisticated theory, all reflect the potential to continue to make advanced in improving testing and treatment of STIs. To date, however, few prediction rules have been validated and, hence, the dissemination and usage of prediction rules in STI service provision remains in the nascent stages. The well-performing prediction rule derived and broadly validated here provides evidence that risk estimation tools have a place in sexual health service provision. New investments in research and practice are required to facilitate the effective integration of prediction rules into routine sexual health service provision and more attention should be paid to their scaling up and to the scientific evaluation of their effects over time.

Key messages

  • This article highlights the geographical validation of the ‘Vancouver’ risk estimation tool for screening asymptomatic chlamydia and gonorrhoea.

  • The prediction tool showed adequate discrimination and calibration upon validation in seven clinics outside of Vancouver.

  • These findings are encouraging and bolster confidence in recommending this tool for use in sexual health services and programmes.

  • The risk score could be easily implemented and is accurate enough to convey important screening considerations.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Handling editor Jackie A Cassell

  • Twitter Follow Mark Gilbert at @mpjgilbert

  • Contributors TF was the lead investigator for the empirical work presented and responsible for all major areas of concept formation, data analysis, as well as manuscript composition. MG and PG were involved in the early stages of concept formation and contributed to manuscript edits. PG also contributed data analyses and interpretation. JS was the supervisory author on this project and was involved throughout the project in concept formation and manuscript composition.

  • Funding TF was supported by the Canadian Institutes of Health Research (CIHR) Doctoral Research Award. PG was supported by a grant from the Natural Sciences and Engineering Research Council of Canada.

  • Competing interests None declared.

  • Ethics approval University of British Columbia's Research Ethics Board (certificate # H11-02000).

  • Provenance and peer review Not commissioned; externally peer reviewed.