Karen Stanecki

Peter D Ghys

Geoff P Garnett

Catherine Mercer

We previously developed a flexible specification of the UNAIDS Estimation and Projection Package (EPP) that relied on splines to generate time-varying values for the force of infection parameter. Here, we test the feasibility of this approach for concentrated HIV/AIDS epidemics with very sparse data and compare two methods for making short-term future projections with the spline-based model.

Penalised B-splines are used to model the average infection risk over time within the EPP 2011 modelling framework, which includes antiretroviral treatment effects and CD4 cell count progression, and is fit to sentinel surveillance prevalence data with a Bayesian algorithm. We compare two approaches for future projections: (1) an informative prior related to equilibrium prevalence and (2) a random walk formulation.

The spline-based model produced plausible fits across a range of epidemics, which included 87 subpopulations from 14 countries with concentrated epidemics and 75 subpopulations from 33 countries with generalised epidemics. The equilibrium prior and random walk approaches to future projections yielded similar prevalence estimates, and both performed well in tests of out-of-sample predictive validity for prevalence. In contrast, in some cases the two approaches varied substantially in estimates of incidence, with the random walk formulation avoiding extreme changes in incidence.

A spline-based approach to allowing the force of infection parameter to vary over time within EPP 2011 is robust across a diverse array of epidemics, including concentrated ones with limited surveillance data. Future work on the EPP model should consider the impact that different modelling approaches have on estimates of HIV incidence.

UNAIDS, working with country analysts, currently uses the Estimation and Projection Package (EPP) to estimate and predict trends in HIV incidence, prevalence and mortality.

In this paper, we extend our previous work

We fit models to sentinel surveillance time series data on prevalence, as is typically done by UNAIDS when estimating epidemic trajectories.

As described elsewhere in this supplement, EPP 2011 contains a flexible modelling option that allows the force of infection parameter _{i}), expressed as: β_{i}=β_{i−1}+(β_{i−1}−β_{i−2})+u_{i} _{i} ∼ normal(0,τ^{2}), the amount of smoothness is determined by the variance parameter τ^{2}, which must also be estimated, and we assumed τ^{2}∼ inverse-gamma (0.001,0.001).^{2}, and the initial pulse of infection to seed the epidemic.

Short-term projections beyond the last year with surveillance data are important outputs from EPP. To facilitate these projections, the spline-based model presented previously,

In a limited number of cases, the spline-based model that employs this prior yields rapidly changing patterns of incidence when predicting epidemic behaviour beyond the data. To better understand the impact that assumptions about future behaviour of _{t+1})∼N(log(_{t}), σ^{2})), using an empirical variance term (σ^{2}) calculated as the mean of the squared differences in adjacent values for ^{2}_{t}=σ^{2}_{t1}(t−t_{l}) where t_{1} is the last year with observed data, so that variability increases with the duration of the prediction.

We assessed the performance of the equilibrium prior and the random walk approaches to future projections by fitting the model to a subset of the data truncated at 5 years before the last year with surveillance data, and computing model predictions for the 5 truncated years. Based on these out-of-sample predictions, we calculated the coverage and width of clinic-specific prediction intervals, and the mean absolute error (MAE) of observed clinic prevalence versus the posterior median of predicted prevalence.

Within the EPP 2011 modelling framework, we fit the original spline-based model that uses an equilibrium prior for future projections

We also fit the spline-based model to specific risk groups in concentrated epidemics from the following countries: Argentina, Armenia, Brazil, Iran, Jamaica, Kazakhstan, Mexico, Moldova, Myanmar, Nepal, Nicaragua, Pakistan, Ukraine and Uruguay, for a total of 87 subepidemics. Despite very sparse surveillance data in many of these settings, the spline-based model generated plausible projections across this diverse set of epidemics, with the exception of the general population of Pakistan, which has extremely low prevalence (online appendix figure B). Illustrative examples of projections for prevalence are presented in

Prevalence and incidence projections for specific risk groups in Argentina, as generated from spline-based force of infection models. ‘Equilibrium prior’ projections use an informative prior for

In formal out-of-sample prediction tests for generalised epidemics, which involved simulating the posterior predicted distribution of site-level prevalence data, the equilibrium prior (coverage=83%, prediction interval width=0.090, MAE=0.020) and random walk (coverage=82%, prediction interval width=0.086, MAE=0.021) approaches had similar performance. These same statistics as calculated for the urban epidemics of Botswana, Ethiopia, Gabon, Ghana, Kenya, Namibia, Rwanda, Tanzania, Uganda and Zambia (equilibrium prior: coverage=83%, prediction interval width=0.07, MAE=0.019 and random walk: equilibrium prior: coverage=82%, prediction interval width=0.07, MAE=0.019) compare favourably to those for current EPP models as described elsewhere in this supplement.

Average coverage of site-level 95% prediction intervals for final year of out-of-sample projections in 69 generalised epidemics, comparing two approaches to making future projections from a spline-based force of infection model: (1) an informative prior related to equilibrium prevalence and (2) a random walk formulation. When the two methods have different coverage for a given subepidemic, they are connected with a vertical line.

The equilibrium prior and random walk approaches to making future projections yielded similar estimates for prevalence but were more variable in terms of their predictions for incidence. In particular, the equilibrium prior could predict rapid changes in incidence in some settings, unlike the random walk approach (

Prevalence, incidence and force of infection parameter (

Ratios of projected incidence and prevalence for final year of out-of-sample projections in 69 generalised epidemics, comparing the random walk (numerator of ratio) to equilibrium prior (denominator of ratio) approaches to making future projections with a spline-based force of infection model.

We measured computational efficiency of the spline-based EPP model by counting the number of likelihood calculations that were required to obtain convergence of the IMIS algorithm, as likelihood calculation is the rate-limiting step in the fitting procedure. We focused on the 75 generalised subepidemics, as concentrated epidemics typically require less computing time due to their sparse data. The median number of likelihoods was approximately 33 000, although a few epidemics required over 100 000 likelihoods to be calculated, namely urban regions of Kenya, Namibia, Nigeria and Uganda.

The length of the projection period beyond the last year of observed data could affect in-sample fit as, for example, the spacing of the basis splines within the function used to generate curves for

A spline-based force of infection model implemented within the EPP 2011 framework generated robust in-sample prevalence and incidence projections across 162 epidemics. Of particular note, approximately half of these were concentrated epidemics, which have sparse data and have been difficult to fit with other flexible modelling specifications. Two approaches to making future projections beyond the last year with surveillance data were more likely to differ in their projections of incidence than their projections of prevalence, and both of these approaches performed well in formal out-of-sample prediction tests for prevalence. From a computational standpoint, these spline-based models should perform at least as efficiently as the current flexible model implemented in EPP 2011, as they have fewer parameters to estimate. This is an important consideration for end-users of the software who desire reasonable computing times when making projections.

An important observation from this study is that future projections of incidence, which are increasingly being used in reports on the global HIV/AIDS epidemic,

Using a spline-based model for in-sample fit, combined with a random walk for out-of-sample projections, can be viewed as a hybrid implementation of previous proposals.

The models considered here are not without limitations. Modelling the force of infection parameter with splines only can lead to unstable projections at the data boundary, requiring either a prior for

In conclusion, an approach that uses splines to generate in-sample curves for

Splines can generate well-behaved, flexible curves that can be used to allow model parameters to change value over time.

Modelling the force of infection parameter with splines may help improve the efficiency and accuracy of EPP projections.

Future work on the EPP model should consider the impact that different modelling approaches have on estimates of HIV incidence.

We thank Tim Brown for sharing Java code and documentation for the transmission model implemented in the 2011 version of the Estimation and Projection Package, Karen Stanecki, Juliana Daher, and Paloma Cuchi for assistance with surveillance data, and Le Bao and Adrian Raftery for methodological discussions.

DRH designed the study, acquired the data, programmed the model, conducted analyses and wrote the first draft of the manuscript. JAS contributed to development of the model, interpretation of results and revision of the manuscript.

This work was supported in part by the Joint United Nations Programme on HIV/AIDS.

None.

Commissioned; externally peer reviewed.

Readers interested in HIV surveillance data should contact UNAIDS.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/