UNAIDS and country analysts use a simple infectious disease model, embedded in the Estimation and Projection Package (EPP), to generate annual updates on the global HIV/AIDS epidemic. Our objective was to develop modifications to the current model that improve fit to recently observed prevalence trends across countries.

Our proposed alternative to the current EPP approach simplifies the model structure and explicitly models changes in average infection risk over time, operationalised using penalised B-splines in a Bayesian framework. We also present an alternative approach to initiating the epidemic that improves standardisation and efficiency, and add an informative prior distribution for changes in infection risk beyond the last data point that enhances the plausibility of short-term extrapolations.

The spline-based model produces better fits than the current model to observed prevalence trends in settings that have recently experienced levelling or rising prevalence following a steep decline, such as Uganda and urban Rwanda. The model also predicts a deceleration of the decline in prevalence for countries with recent experience of steady declines, such as Kenya and Zimbabwe. Estimates and projections from our alternative model are comparable to those from the current model where the latter performs well.

A more flexible epidemiological model that accommodates changing infection risk over time can provide better estimates and short-term projections of HIV/AIDS incidence, prevalence and mortality than the current EPP model. The alternative model specification can be incorporated easily into existing analytical tools that are used to produce updates on the global HIV/AIDS epidemic.

Many countries lack sufficient health information systems to measure the number of individuals living with HIV/AIDS, the rate of new infections, and the need for antiretroviral therapy (ART). To fill this information gap, the UNAIDS Reference Group on Estimates, Modelling and Projections developed, and continues to refine, an epidemiological model for estimation and short-term extrapolation of HIV/AIDS trends from limited surveillance data.

While the Reference Group model provides a relatively simple tool that produces plausible fits to a wide variety of observed patterns in surveillance data, a number of specific patterns have been challenging to reproduce. Examples include countries in which observed prevalence has declined sharply and then levelled off, and countries in which prevalence has been steadily increasing or decreasing.

The rapid rise in prevalence in the early years of many African epidemics is consistent with a high initial force of infection on average, as the epidemic begins to spread among a population with relatively high-risk behaviours. However, if this high average infection rate were to persist as HIV expanded into the general population, prevalence would reach unrealistically high levels. To address this dynamic, the Reference Group model partitions the uninfected population into two risk groups, ‘susceptible, not at-risk’ and ‘susceptible, at-risk’. The fraction of the population initially in the ‘susceptible, at-risk’ compartment constrains the peak prevalence level. The rate at which prevalence declines after its peak is determined by the balance between new infections and mortality, and can be modulated in the Reference Group model to a limited degree by adjusting the fraction of new adults entering the ‘susceptible, at-risk’ population.

Modifications to the Reference Group model that enable better fits to prevalence data should meet several criteria. First, a modified model should maintain the explicit connection in the current model between prevalence of infected people and the risk of infection among uninfected people. Models that fit curves directly to prevalence data without explicitly modelling the underlying infection dynamics might, for example, produce a prevalence curve that would imply a negative incidence rate. Another argument for maintaining an explicit epidemiological model (as opposed to a curve-fitting approach) is to accommodate estimation of incidence and mortality patterns, scenario analyses and future predictions. Second, parsimony is important as many settings do not have the data required to parameterise more complex models, such as those that incorporate age structure, detailed patterns of sexual behaviour, or expanded risk group structure.

Previously, we suggested a relatively parsimonious modification to the Reference Group model that would eliminate the current partitioning of the susceptible population into ‘at-risk’ and ‘not-at-risk’ segments, but would allow the average infection rate to change over time.

For many generalised epidemics, such as those in sub-Saharan Africa, the primary source of time series data on HIV prevalence is a system of sentinel surveillance sites that undertake anonymous HIV-prevalence testing of blood samples from pregnant women attending antenatal clinics (ANC).

We fit the model separately to a total of 18 populations, distinguishing urban from rural regions in nine countries: Angola, Burundi, Ethiopia, Gabon, Kenya, Lesotho, Rwanda, Uganda and Zimbabwe. Rwanda and Uganda offer two important examples in which the Reference Group model fails to capture recent trends in prevalence.

Our proposed alternative to the current EPP model explicitly represents changes in infection risks over time to better approximate patterns of HIV spread in generalised epidemics. We simplify the population structure of the Reference Group model by collapsing the ‘susceptible, at-risk’ and ‘susceptible, not at-risk’ compartments into one uninfected (and at-risk) sub-population. In line with the modified structure, we remove two corresponding parameters from the Reference Group model — _{0}, which governs the initial split between at-risk and not-at-risk entrants, and

The model can be described by three differential equations, following the general notation for the Reference Group model used in EPP 2009

The key difference in the parameterisation of this model compared with that in the current Reference Group model is the use of a time-varying function for infection risks, _{1}(_{2}(_{1}(_{2}(

We make another important modification to the Reference Group model by changing the way in which the model captures the initial phase of an epidemic. Fitting the Reference Group model currently requires estimation of the year (_{0}) in which a fixed seed value of _{t}_{0}=0.0025 initiates the epidemic. In our proposed specification, an epidemic is initiated in 1975 with a small pulse of infected individuals _{0}, which is a parameter to be estimated with a broad prior uniform distribution spanning the range 10^{−13} to 0.0025. Given the large uncertainty about when HIV first appeared in human populations,_{0}.

There are several possible choices of spline functions for modelling infection risks over time, _{1} through _{7}) serving as weights.

Example of a spline-based curve for infection risks over time (solid line) using third degree B-splines comprised of seven evenly-spaced basis functions (dashed curves labelled 1 through 7). The spline curve is a linear combination of the basis functions and their coefficients (labelled _{1} through _{7}).

In line with the current strategy for estimating model parameters and characterising uncertainty, we adopt a Bayesian approach to fitting the model to surveillance data,_{i}_{i -}_{1}+(_{i -}_{1}–_{i -}_{2})+_{i}_{1} and _{2} are assumed to be proportional to a constant. Note that if all of the _{i}^{2}), the amount of smoothness is determined by the variance parameter ^{2}, as smaller values for ^{2} will produce smaller values for _{i}, which in turn imply a smoother function. The parameter ^{2} is estimated during model fitting, and we use the common uninformative prior ^{2} ∼ inverse-gamma(0.001,0.001).

While the primary objective of the model described here is to estimate past epidemic trends for reporting on the current status of HIV/AIDS epidemics, one of the required functions of the model is to make short-term projections beyond the last observed data point in settings that lack data for the most recent reporting year(s). There are a range of methodological challenges in extrapolating time series beyond the last observed data point, including both general challenges and certain challenges specific to spline models.^{*}^{*}^{*})/^{*}) is modelled prevalence in the last year of data. Variance of the distribution is set equal to the mean of the squared deviations between

We note that this prior is intended only to ensure that most curves for

Following the work of Bao and Raftery, we use incremental mixture importance sampling (IMIS) for model fitting, which is a Bayesian algorithm for simulating posterior distributions.

We compared the alternative models in terms of their observed fit to surveillance data from all sites. The performance of the models was also evaluated based on more formal criteria. To compare the relative fit of the spline-based and Reference Group models to ANC data, we computed Bayes factors^{2}, and we visually compared posterior distributions of each model parameter to the distribution of initial draws for the IMIS algorithm.

As shown by the illustrative examples in

Median prevalence with 95% credible intervals (in black) for the spline-based model for

Posterior predictive checks suggest that the spline-based model predicts the observed data well across and within sites.

Posterior predicted prevalence versus observed prevalence across all antenatal clinic sites for urban and rural Uganda, along with posterior predicted site-level prevalence for four selected antenatal clinics (red indicates observed site prevalence, black indicates median and 95% prediction intervals for predicted site prevalence).

The IMIS algorithm was largely insensitive to initial conditions. In comparing posterior projections from the algorithm started with two different random seeds, estimated values for _{0}, were much more similar, and patterns of ^{2} ∼ uniform(0,30) were similar to the ^{2} ∼ inverse-gamma(0.001,0.001) prior model, with some differences in early-stage estimates for urban Burundi and rural Rwanda.

The model we propose here incorporates an informative Bayesian prior probability distribution for projections of

Median prevalence with 95% credible intervals for the spline-based model with (in black) and without (in blue) a prior for the future behaviour of

Median prevalence with 95% credible intervals for out-of-sample prevalence predictions for the spline-based model (in blue), with median fit for the full-sample results added for comparison (in black), for four selected settings. Data for the last 3 years of observation were excluded for generating out-of-sample predictions (denoted by the vertical dashed line).

As rising coverage of ART makes interpretation of trends in prevalence increasingly complicated, there has been a corresponding shift in emphasis towards measurement and modelling of incidence. This shift is reflected in the most recent changes to the suite of tools used by UNAIDS and country analysts in their routine updates on the status of HIV/AIDS epidemics.

Plots of

Our investigation into relaxing the UNAIDS Reference Group model's assumption of a constant infection rate parameter over time suggests that it is feasible to obtain better fits to recent prevalence data, as evidenced by the flexible model's ability to capture recent trends in Uganda and Rwanda. The model also shows promise in avoiding projected prevalence declining towards zero for epidemics like that in Kenya. Using splines to model

Several limitations in the proposed model should be noted. The spline-based approach in the formulation presented here yielded divergent out-of-sample predictions in some countries. It is worth noting, however, that the out-of-sample prediction errors typically occurred in settings where predictions were made in close proximity to an epidemic peak. For the intended use of this model, most generalised epidemics have advanced sufficiently beyond the peak that the required out-of-sample predictions should be less vulnerable to error. The choice of prior for

An important limitation in both the flexible model and the Reference Group model is that they focus on average rates of infection and mortality across the entire adult population, thus summarising the dynamics of HIV spread through a heterogeneous population with a few equations instead of explicitly modelling the distribution of risk behaviours across individuals and accounting for the age structure of the population.

In comparison to the Reference Group model, the spline-based approach imposes somewhat greater computational demands. With seven spline basis functions there are a total of nine parameters to estimate, compared with four in the Reference Group model. On average the IMIS algorithm needed to calculate twice as many likelihoods when fitting the spline-based model as when fitting the Reference Group model. The relatively common shape of

By obtaining better projections of HIV/AIDS epidemic dynamics, a modification to the Reference Group model that allows

The authors gratefully acknowledge helpful inputs from Leontine Alkema, Le Bao, Tim Brown, Peter Ghys, Chris Paciorek, Adrian Raftery and Karen Stanecki.

This study was supported in part by funding from the UNAIDS Reference Group on Estimates, Modelling and Projections, and by a T32 training grant from the National Institute of Allergy and Infectious Diseases (AI 007433).

None.

DRH led the design of the study, development of the model, data acquisition, analysis and interpretation, and wrote the first draft of the manuscript. AMZ and JKH provided methodological input on model development and data analysis and contributed to revision of the manuscript. JAS conceived the study and contributed to design, model development, analysis, interpretation and writing of the manuscript.

Not commissioned; externally peer reviewed.