Article Text

Download PDFPDF

P563 MSM predictive modeling within a large, linked database of laboratory, surveillance, and administrative healthcare records
  1. Travis Salway1,
  2. Zahid Butt2,
  3. Stanley Wong2,
  4. Carmine Rossi2,
  5. Jason Wong3,
  6. Amanda Yu2,
  7. Maria Alvarez2,
  8. Troy Grennan4,
  9. Mark Gilbert3,
  10. Mel Krajden2,
  11. Naveed Janjua2
  1. 1BC Centre for Disease Control, Vancouver, Canada
  2. 2BC CDC, Vancouver, Canada
  3. 3BC Centre for Disease Control, Clinical Prevention Services, Vancouver, Canada
  4. 4British Columbia Center for Disease Control, Vancouver, Canada


Background Enumeration or measurement of populations of men who have sex with men (MSM) is critical to developing and evaluating sexually transmitted and bloodborne infection (STBBI) prevention and treatment programs. However, there is a lack of data sources in which sexual orientation or behaviour is measured. In this study, we present the development and validation of a novel model (i.e., ‘computational phenotype’) to predict MSM status using multiple data sources.

Methods Three disease case surveillance databases (HIV, hepatitis B and C, and syphilis), a public health laboratory database (which performs ≥95% of all HIV, hepatitis C and syphilis tests in British Columbia), and five administrative health record databases were linked and aggregated, resulting in a retrospective cohort of 727,091 adult men from 1990 to 2013. Self-reported MSM status (‘gold-standard’) from the three disease case surveillance databases was used to develop a multivariable prediction model for identifying MSM in the larger cohort. Models were selected using ‘elastic-net’ (combination of lasso and ridge regression), implemented through the GLMNet package in R, and a final model optimized area under the receiver operating characteristics curve (AUC).

Results History of gonorrhea and syphilis diagnoses, HIV tests in the past year, history of visit to identified gay and bisexual men’s clinics, and residence in MSM-dense neighborhoods (based on self-reported MSM) were all positively associated with being MSM. The selected model had a sensitivity of 72%, specificity of 94%, and AUC of 92%. Combining self-reported MSM (n=6,280) and predicted MSM (n=85,521), a total of 91,801 men (13% of the cohort) were classified as MSM.

Conclusion Applying a computational phenotyping method to administrative data yielded a cohort of 85,521 MSM, which may be used to monitor and evaluate health outcomes and healthcare utilization. Sensitivity and specificity of this model were comparable to interviewer-administered self-report measures of sexual orientation.

Disclosure No significant relationships.

  • gay bisexual and other men who have sex with men
  • modeling and prevalence
  • surveillance

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.