Article Text

Download PDFPDF

Original research
Development of a prognostic tool exploring female adolescent risk for HIV prevention and PrEP in rural South Africa, a generalised epidemic setting
  1. Sarah Gabrielle Ayton1,2,
  2. Martina Pavlicova2,
  3. Hod Tamir3,
  4. Quarraisha Abdool Karim1,4
  1. 1 Department of Epidemiology, Columbia University Mailman School of Public Health, New York City, New York, USA
  2. 2 Department of Biostatistics, Columbia University Mailman School of Public Health, New York City, New York, USA
  3. 3 ICAP, Columbia University Mailman School of Public Health, New York City, New York, USA
  4. 4 Centre for the AIDS Programme of Research in South Africa (CAPRISA), Nelson R Mandela School of Medicine, University of KwaZulu-Natal, Durban, KwaZulu-Natal, South Africa
  1. Correspondence to Sarah Gabrielle Ayton, Department of Biostatistics, Columbia University Mailman School of Public Health, 722 W. 168th Street, 6th floor, Rm. 635, NY 10032, USA; sarah.ayton{at}


Objectives Adolescent females in sub-Saharan Africa bear a disproportionate burden of new HIV infections but have been excluded from prognostic research, such as developed risk calculators. This study examines whether validated risk calculators, which calculate HIV risk among sub-Saharan African women, can be modified to assess HIV risk among adolescent girls. The performance of selected risk variables from validated calculators and the literature was evaluated among adolescent females using modern advanced statistical tools.

Methods Risk variables for the updated tool were selected from the CAPRISA 007 (CAP007) trial (2010–2012) questionnaires. An initially HIV-seronegative cohort of rural South African female high school students (n=1049) aged 14–25 years was selected. The number and characteristics of latent factors, or dimensions, underlying selected variables were assessed using exploratory factor analysis (EFA). The updated tool’s effectiveness identifying trends in adolescent risk were assessed with latent class analysis (LCA).

Results EFA identified two key latent factors: sexual behaviour and socioeconomic risk factors. Latent sexual behaviour risk influenced contraception use (0.883), perceived HIV risk (0.691) and pregnancy (−0.384). Latent socioeconomic risk influenced low HIV knowledge (0.371), financial dependence (0.532), prior HIV testing (−0.379) and alcohol use (−0.332). Using LCA, three underlying categories of adolescent females were identified: those with no, low and high risk of HIV (1.10%, 2.26% and 2.91% 1-year seroconversion rates, respectively). Herpes simplex virus serotype-2, condom contraception, alcohol use, pregnancy and age were significantly associated with higher risk class membership, while non-condom contraception was associated with lower risk class membership.

Conclusions Adolescent females are at unequal risk of acquiring HIV. Findings suggest the updated tool captures two main facets of adolescent characteristics and may identify differential risk. This work supports further investigation to inform development of targeted differentiated interventions and efficient prognostic tools for adolescents in high-risk settings.

  • hiv women
  • africa
  • adolescent
  • epidemiology (general)
  • prognostic indicators

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Despite tremendous progress in developing HIV treatments and reducing infant acquisition, 40% of global HIV infections occur among young people aged 15–24 years (adolescents hereafter).1 Sub-Saharan Africa is a focal point in the ongoing HIV epidemic,2 3 in which females have an HIV acquisition rate more than double that of male peers.4 In South Africa, approximately 30% of new infections are among adolescent females,4 5 who are infected 5 years before males.2 4 The disproportionate burden of HIV among females reflects and perpetuates age–sex disparate or intergenerational HIV infection, in which HIV is transmitted intergenerationally from adult male to adolescent female sexual partners.6–8 Such relationships are prone to power imbalance and intimate partner violence, which decrease safe sex practices and increase HIV infection risk.9 10 Notwithstanding substantial HIV burden and vulnerability,11 adolescents have been largely excluded from groundbreaking research, discoveries and interventions.12 13

HIV prevention strategies for adolescents include abstinence, delay of sexual debut, consistent condom use and monogamy with HIV-seronegative partners.6 Coupled with partner power dynamics and HIV susceptibility,9 10 prevention strategies reliant on male partner cooperation fail to adequately prevent HIV transmission in this high priority population14 15; African females continue to comprise the largest proportion of new HIV infections globally.2 Research has suggested targeted administration of pre-exposure prophylaxis (PrEP) could disrupt HIV transmission and bypass partner cooperation obstacles as an efficient and effective female-initiated prevention strategy.16 While recent work has examined this possibility among African adult females through the development of risk calculating tools (including demographic, partner, behavioural and clinical characteristics),17 18 South African female adolescents have been overlooked, and remain vulnerable and underserved in the ongoing South African HIV epidemic.

To effectively intervene in female adolescent population, risk calculators must be developed using adolescent-relevant HIV risk variables, identifying differential risk. While data on adolescents are scarce, the CAPRISA 007 (CAP007) trial provides a unique and valuable opportunity to explore HIV risk and behaviours among adolescent female students.19 20 These analyses use data from one of the highest HIV burden health districts in rural KwaZulu-Natal to assess an adapted risk tool, including characteristics from validated tools for HIV prediction in adults,17 18 as well as adolescent risk characteristics (knowledge,1 21 school attendance22 and perceived risk).23 First, we assess latent risk factors, or dimensions of adolescent risk, underlying risk variables captured with our tool. Second, we explore the differential groupings (latent subject classes) of adolescent females based on risk variables in our tool.

In this manuscript, we explore the efficacy of risk variables in predicting adolescent HIV infection to provide a foundation for enhanced prognostic medicine (risk calculators) and PrEP interventions.


The CAPRISA 007 (CAP007) trial, a randomised controlled trial (open-label, matched-pair design), evaluated the impact of a conditional cash incentives programme on preventing HIV and herpes simplex virus serotype-2 (HSV-2) infection among 9th and 10th grade students (n=3217) in 14 rural South African high schools from 2010 to 2012. Our cohort study includes female adolescent CAP007 participants, aged 14–25 years, who enrolled in 2010, were HIV-seronegative in 2011 and followed up in 2012 (n=1049).

Study measures included HIV (HIV ELISA from Vironostika Uniform 11 plus O Assay, Biomerieux (Netherlands); HIV-positive samples were confirmed with SD Bioline HIV-1/2 ELISA V.3.0 kit (SD Standard Diagnostics, Korea)) and HSV-2 serology (HerpeSelect HSV-2 ELISA Kits (Focus Diagnostics, Cypress, California, USA) for qualitative detection of human IgG class antibodies to HSV-2 based on recombinant gG2), measured at baseline, 12 months and 24 months. Self-administered questionnaires assessed academic performance, demographic information and risk data at baseline, 12 months and 24 months. Further details about CAP007 have been published elsewhere.19 20

Risk variables used in analyses were derived from data collected in 2011, and HIV outcome status was obtained from 2012 data. Based on validated adult HIV risk calculators,17 18 age category, financial support, alcohol use, contraceptive use, pregnancy, prior HIV testing and HSV-2 serostatus were included in our tool. We also incorporated risk variables unique to adolescents, including HIV knowledge,1 21 school absence22 and perceived HIV risk,23 to account for potential dimensions of adolescent risk not captured by adult risk tools (online supplementary dictionary).

Supplemental material

Statistical methods

Latent risk factors

We conducted exploratory factor analysis (EFA) to explore and gain insights into dimensions of adolescent risk captured by the updated tool. EFA explores the variance structure of correlation coefficients by analysing common variance and ignoring error variance among latent factors, represented by measured variables.24 We ran the data through a correlation matrix, orthogonally rotated components and performed extraction with principal axis factoring; preliminary factors were extracted by variable type. The WLSMV estimator was used to analyse the tetrachoric correlation matrix. Eigenvalue assessment showed three factors with eigenvalues greater than 1.0, indicating as many as three latent factors underlying the updated tool.

We ran EFA with one, two and three factors. The χ2 goodness-of-fit test assessed model fit. As this test is sensitive to sample size and data normality, we also examined the comparative fit index (CFI), Tucker-Lewis Index (TLI) and the root mean square error of approximation (RMSEA). Fit indices and cut points included RMSEA (≤0.05), CFI (≥0.90) and TLI (≥0.90) to indicate good and acceptable model fit.25 26 Indicators that appeared weak (<0.30) or displayed cross-loadings in the factorial solution were excluded from further analysis. The drug use variable was excluded from factor analyses due to low prevalence, which caused model failure to converge. The final two-factor model was chosen based on fit indices and meaningfulness of risk variable grouping. All factor analyses were performed in Mplus V.7.27

Latent subject classes

Latent class analysis (LCA) investigates the underlying risk classes among female adolescent participants based on measured risk variable responses.28 This model-based clustering technique identifies subject subtypes from multivariate data and provides insight into the updated tool’s ability to identify differential risk among adolescent females. The number of underlying classes, class prevalences and probability that a subject falls within a particular class may be estimated by examining associations between response and latent class. Models were fit with up to four classes, and the final number of classes was selected based on class interpretability and the sample-size adjusted Bayesian information criterion (BIC) fit statistic.29 Since HIV status was not part of class defining variables, the frequency and proportion of HIV seroconversions was computed for each class. In the second step, we examined relationships between individual risk variables and each class using multinomial logistic regressions. This allowed us to identify risk variables significantly associated with class membership. Class analyses were performed in Mplus V.7.27 Multinomial logistic regressions were performed in R V.

All hypothesis tests were two-sided and assessed on the 5% level of significance.


Among female adolescent participants (n=1109), 20 were HIV seropositive and 111 were HSV-2 seropositive in 2011; there were 18 HIV seroconversions and 55 HSV-2 seroconversions as of 2012 (table 1). Table 1 includes descriptive statistics for these CAP007 female participants. Between 2011 and 2012, 40 HIV-seronegative female participants were lost to follow-up. A total of 1049 female adolescents met the inclusion criteria for our analysis: female participants, aged 14–25 years, enrolled in 2010, HIV-seronegative in 2011 and followed up in 2012. Of our sample, most had no past-year contraception use, high HIV knowledge, prior HIV testing, a source of spending money, no past-year drug or alcohol use, no past-year pregnancy, attributed school absences to illness or had no past-year absences, self-identified as low or no risk of HIV infection, and were under 18 years.

Table 1

Demographic information collected in 2011 among CAP007 adolescent females (enrolled in 2010) stratified by 2011 HIV serostatus

Latent risk factors

Based on model fit indices for the three models, geomin rotated factor loadings and parsimony, the best representation of risk variables is within the two-factor approach (online supplementary table 1). In the two-factor approach, there is a clear, meaningful loading of risk variables on two latent factors relating to HIV risk. Contraception, pregnancy and perceived HIV risk had high and significant loadings (0.883, –0.384 and 0.691, respectively) on the first factor (figure 1); HIV knowledge, prior HIV testing, financial dependence, and alcohol use had high and significant loadings (0.371, -0.379, 0.532 and -0.332, respectively) on the second factor. The first factor was termed the sexual risk factor, as contraception use and pregnancy are directly related with sexual activity, as is perceived HIV risk. The second factor was defined as the socioeconomic risk factor, as knowledge, prior testing, financial dependence and alcohol use are all related with education, financial status and sociocultural norms. HSV-2 status and school absence had low factor loadings (<0.30) on both factors; these risk variables were dropped from analysis. We note a low correlation between sexual and socioeconomic risk factors (R=−0.133), indicating probable factor independence.

Figure 1

Geomin rotated factor loadings and standard errors of the final two-factor EFA model.

Factor loadings of the two-factor model indicated positive relationships between condom and non-condom contraception use (0.883), and high and low perceived HIV risk (0.691); conversely, past-year pregnancy (−0.384) was negatively associated with sexual behaviour risk (figure 1). Factor loadings indicated positive associations between socioeconomic risk and low HIV knowledge (0.371) and financial dependence (0.532); loadings for prior HIV testing (at least once) (−0.379) and alcohol use in the past year (at least once) (−0.332) indicated negative relationships with socioeconomic risk (figure 1). Analyses stratified by age for adolescents §amp;lt; 18 and adolescents ≥ 18 yielded similar factor loadings as the unstratified factor analysis.

Latent subject classes

To assess differential risk in the population, we classified the female adolescents using LCA and found, based on the indices, the most parsimonious categorisation of female adolescents was with a three-class split of the sample (online supplementary table 2); the two-class model did not distinguish between no-risk and low-risk participants, while the four-class model further divided the high-risk group into two non-meaningful groups. Further, the three-class model had the lowest BIC, suggesting the best split.

In the three-class model, there is a clear and meaningful grouping of female adolescents, based on the distribution of risk variable proportions, into three classes relating to HIV risk: no risk, low risk and high risk (table 2). We characterise the three classes as follows: class 1 most clearly represented those at almost no risk of HIV infection; class 1 is likely representative of younger adolescents with little to no sexual activity or drug use, and was named the ‘no risk’ group. Conversely, class 2 captured those at the highest risk of HIV infection; this group includes older adolescents who have unprotected sex and a history of substance use, and was named the ‘high risk’ group. Class 3 described those at low risk of HIV infection, and includes adolescents who practice safe sex and have minimal history of substance use, and was named the ‘low risk’ group. Interestingly, among those classified as high risk, 2.91% became HIV seropositive in the following year; 2.26% of those designated as low risk and 1.10% of those classified as no risk became HIV positive in the following year (table 2). Due to the small number of seroconversion in the sample, the differential relationship between class and seroconversion could not be statistically assessed.

Table 2

Class risk variable proportions from the three-class LCA model among the sample of CAP007 adolescent females with HIV-negative serostatus in 2011 (n=1049)

In the investigation of variable associations with the three classes, multinomial logistic regression indicated HSV-2 status, alcohol use, pregnancy and age category as significantly associated with class 3, ‘low risk’, membership compared with class 1, ‘no risk’ (table 3). Risk variables significantly associated with class 3 membership were predictive of low HIV risk compared with no HIV risk. Risk variables including HSV-2 status, contraception, alcohol use, pregnancy and age category indicated significant associations with membership in class 2, ‘high risk’, compared with class 1 (table 3). All risk variables significantly associated with class 2 membership compared with class 1 were predictive of high HIV risk compared with no HIV risk, with the exception of use of non-condom contraception, which had a protective effect. HIV knowledge, prior HIV testing and financial dependence were not significantly associated with class categorisation.

Table 3

Univariable multinomial logistic analysis of latent class membership among CAP007 adolescent females with HIV-negative serostatus in 2011 (n=1049)


In South Africa, safe sex educational interventions have not been dramatically successful mostly due to reliance on partner cooperation and sexual relationship equality9 10 14 15; few studies have examined appropriate interventions for adolescent females. Recent work has indicated that PrEP,12 16 17 the only female-initiated HIV prevention technology, is a viable means of reducing the HIV/AIDS epidemic in regions, such as South Africa. By identifying adolescent females that are high risk for HIV acquisition, preventative treatment and interventions (ie, PrEP) may be initiated through ongoing programmes, clinics and primary care providers to mediate the burden of HIV acquisition among sub-Saharan African females.

This is the first study, to our knowledge, assessing risk variables in South African adolescent females, using exploratory factors and latent classes. We specifically assess risk variables relevant to the adolescent population through an updated specialised tool. CAP007 provided the unique opportunity to explore risk variables, adapted from validated adult HIV risk calculators, in the tool and their association with HIV acquisition risk at 1 year. In a region with high epidemic burden, we find that adolescent females are at unequal risk of HIV acquisition and experience vulnerabilities distinct from their adult counterparts.

When exploring risk variables appropriate to female adolescents, two factors emerged as underlying risk variables in the updated tool: sexual behaviour risk factor (contraception, pregnancy and perceived HIV risk) and socioeconomic risk factor (HIV knowledge, prior HIV testing, financial dependence and alcohol use). These results are intuitive, as contraception and pregnancy are directly linked with sexual activity, and perceived risk is likely associated with internalised risk regarding sexual activity. Further, knowledge, prior testing and financial dependence are known socioeconomic risk factors for HIV transmission, and substance use patterns are directly related with HIV transmission. The role of adolescent characteristics in HIV risk factors is distinct from the adult population6 17 18 21 and suggests these risk variables encompass multiple dimensions of latent HIV risk among adolescents.

This study identified associations between adolescent characteristics and HIV risk distinct from previous studies of risk in adults.17 18 21 The negative relationship between past-year pregnancy and latent sexual behaviour risk factor in adolescents is opposite to what we expect among adults, as pregnancy influences biological susceptibility to HIV infection.17 Pregnant adolescents may live with family rather than their sexual partner(s) and experience reduced sexual activity (opportunity for sexual transmission) during pregnancy. Prior HIV testing and past-year alcohol use had negative relationships with latent socioeconomic risk. Among adults, prior testing approximates behavioural responses to internalised risk and is associated with higher HIV risk.18 This was not observed among adolescents and may represent positive proactive sexual health behaviours rather than reactive behaviours. Prior testing may result from sexual autonomy and higher socioeconomic status among adolescents. Alcohol use is associated with increased risk of HIV in adults.21 However, due to low rates of drug and alcohol use, alcohol use may approximate protective effects of age on socioeconomic risk among adolescents.

Following identification of latent risk factors, adolescents were classified into three latent classes: class of those with essentially no risk, class of those with low risk and class of those with high risk of acquiring HIV. HIV seroconversion prevalences indicated preliminary support of classes identified by LCA; while rate of seroconversion increased steadily with risk class membership, statistical tests could not be performed. Multinomial logistic regression analyses indicated non-condom contraception protected against HIV risk, while condom use predicted increased risk. This contradicts the association observed in adults6 and may indicate non-condom contraception among adolescents is an indicator of sexual autonomy, high socioeconomic status or partner monogamy, which protect against HIV. Conversely, use of non-condom contraception may result from fear or uncooperative partners; non-condom contraception may instead be driven by some other characteristic driving reduced HIV risk.


We were unable to statistically assess class differences in seroconversion rates. Findings from such analyses would have been clouded by type I error and cluster effects as an artefact of the small number of seroconversions observed in the data. Future research should assess the updated tool in larger datasets to further determine its ability to distinguish risk among adolescents.

It is noteworthy that the ‘no risk’ group from latent class analysis showed seroconversions, indicating some degree of HIV risk is not captured in the data; there may be unmeasured variables or changes in measured variables between survey administration and HIV testing at 1 year. Future work should examine changes in variables over time along with changes in HIV status. Until such progress in identifying HIV risk has been made, it is important that those considered ‘low risk’ still be included in HIV prevention interventions.

Further, it is possible that pregnant and HIV-infected students are more likely to drop out of school; dynamics influencing school enrolment may have contributed to the low prevalence of HIV in the dataset, which may not fully represent the adolescent population. However, since most South African adolescents are enrolled in school for at least the first 2 years of high school, our analysis provides an opportunity to understand HIV risk trajectory among high school students.


Our research assessing the updated risk tool recognises clear differences in HIV vulnerability among adolescent females, as a population distinct from adults. Our examination of adolescent HIV risk, latent factor structure and latent classes of South African female adolescents provide insights into HIV susceptibility and the performance of our updated tool among female adolescents. This work is a first step towards the development of specialised risk tools for female adolescents and recognises their unique role in HIV transmission. As the HIV epidemic continues in South Africa, we underscore the importance of disrupting transmission before the epidemic can reach younger generations, while also protecting the most vulnerable. Such epidemic disruption may use advances in risk calculators, such as ours, to identify adolescents at high risk of HIV infection and provide PrEP and preventive interventions. We hope this and future research will prioritise the adolescent population, and aid in the development of prognostic tools that predict HIV risk and administer PrEP among adolescent females to end the HIV epidemic in South Africa as a public health threat.

Patient and Public Involvement

It was not appropriate or possible to involve patients or the public in this work.

Key messages

  • South African female adolescents are particularly vulnerable to HIV infection and display distinct risk factors from adult counterparts.

  • The updated risk tool captures latent sexual and socioeconomic risk factors, and may identify differential risk among adolescent females.

  • Comprehensive prevention research on adolescent females and their HIV vulnerabilities is needed to further develop specialised risk tools.


The authorship team would like to thank all the study staff, the CAPRISA Vulindlela Community Research Support Group, the CAPRISA School Research Support Groups, the Vulindlela community, Mgungundlovu District Education and Health Offices, Provincial Departments of Health and Education, members of Zimnande Zonke, the Vulindlela school circuit management, principals, teachers, schools governing bodies, parents and students for their willingness to contribute to and participate in the study. We thank our funders, MIET Africa, for their support. Furthermore, we acknowledge the original CAPRISA 007 team (Professor Quarraisha Abdool Karim, Professor Ayesha Kharsany, Dr Francois von Loggerenberg, Dr Janet Frohlich, Fanele Ntombela, Dr Kerry Leask, Dr Anneke Grobler, Natasha Samsunder and Professor Salim Abdool Karim) for allowing us access to the behavioural and laboratory data that made this analysis possible. Sincere thanks to Lauren Wilson, Paige Xu, Anan Zhou, James Ayton, Elizabeth Ayton and Mario Garcia Pompermayer for critical review of the manuscript.



  • Handling editor Sevgi O Aral

  • Contributors SGA, QAK, MP and HT contributed to the conception and design of the study. SGA and MP contributed to the analysis of data. SGA drafted the manuscript with the help of MP. All authors contributed to the interpretation of the study, revised the manuscript, and critically revised and approved the final manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval University of KwaZulu-Natal Biomedical Ethics Committee (BF105/010 and BE523/14).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available on reasonable request.