Background Enumeration or measurement of populations of men who have sex with men (MSM) is critical to developing and evaluating sexually transmitted and bloodborne infection (STBBI) prevention and treatment programs. However, there is a lack of data sources in which sexual orientation or behaviour is measured. In this study, we present the development and validation of a novel model (i.e., ‘computational phenotype’) to predict MSM status using multiple data sources.
Methods Three disease case surveillance databases (HIV, hepatitis B and C, and syphilis), a public health laboratory database (which performs ≥95% of all HIV, hepatitis C and syphilis tests in British Columbia), and five administrative health record databases were linked and aggregated, resulting in a retrospective cohort of 727,091 adult men from 1990 to 2013. Self-reported MSM status (‘gold-standard’) from the three disease case surveillance databases was used to develop a multivariable prediction model for identifying MSM in the larger cohort. Models were selected using ‘elastic-net’ (combination of lasso and ridge regression), implemented through the GLMNet package in R, and a final model optimized area under the receiver operating characteristics curve (AUC).
Results History of gonorrhea and syphilis diagnoses, HIV tests in the past year, history of visit to identified gay and bisexual men’s clinics, and residence in MSM-dense neighborhoods (based on self-reported MSM) were all positively associated with being MSM. The selected model had a sensitivity of 72%, specificity of 94%, and AUC of 92%. Combining self-reported MSM (n=6,280) and predicted MSM (n=85,521), a total of 91,801 men (13% of the cohort) were classified as MSM.
Conclusion Applying a computational phenotyping method to administrative data yielded a cohort of 85,521 MSM, which may be used to monitor and evaluate health outcomes and healthcare utilization. Sensitivity and specificity of this model were comparable to interviewer-administered self-report measures of sexual orientation.
Disclosure No significant relationships.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.