Karen Stanecki

Peter D Ghys

Geoff P Garnett

Catherine Mercer

As the global HIV pandemic enters its fourth decade, countries have collected longer time series of surveillance data, and the AIDS-specific mortality has been substantially reduced by the increasing availability of antiretroviral treatment. A refined model with a greater flexibility to fit longer time series of surveillance data is desired.

In this article, we present a new epidemiological model that allows the HIV infection rate, r(t), to change over years. The annual change of infection rate is modelled by a linear combination of three key factors: the past prevalence, the past infection rate and a stabilisation condition. We focus on fitting the antenatal clinic (ANC) data and household surveys which are the most commonly available data source for generalised epidemics defined by the overall prevalence being above 1%. A hierarchical model is used to account for the repeated measurement within a clinic. A Bayesian approach is used for the parameter estimation.

We evaluate the performance of the newly proposed model on the ANC data collected from urban and rural areas of 31 countries with generalised epidemics in sub-Sahara Africa. The three factors in the proposed model all have significant contributions to the reconstruction of r(t) trends. It improves the prevalence fit over the classic Estimation and Projection Package model and provides more realistic projections when the classic model encounters problems.

The proposed model better captures the main pattern of the HIV/AIDS dynamic. It also retains the simplicity of the classic model with a few interpretable parameters that are easy to interpret and estimate.

Combating the AIDS epidemic requires quantitative analysis because countries need to ground their AIDS strategies in an understanding of their own epidemics and their national responses. Due to the paucity of reliable information on the incidence of AIDS in developing countries, sentinel surveillance systems for HIV are designed to provide information on prevalence trends to policy makers and programme planners. For the purpose of surveillance, UNAIDS and WHO suggest a classification that describes the epidemic by its current state, that is, generalised, concentrated or low level. In generalised epidemics, HIV prevalence is consistently over 1% in pregnant women in urban areas. The percentage of HIV positive cases are often estimated among antenatal clinic (ANC) patients to represent the general population. In low level or concentrated epidemics, HIV infection has never expanded to a significant level in the general population. The surveillance data are often gathered from each identified most-at-risk population, for example, sexually transmitted diseases clinic patients, injecting drug users and men who have sex with men.

To fill in the information gap on the number of individuals living with HIV/AIDS, the rate of new infections, and the need for intervention and treatment, WHO proposed the AIDS epidemic software called EpiModel in the early 1990s when few surveillance data were available. EpiModel constructs HIV incidence curves based on two inputs: start year of epidemic and recent national adult prevalence. It uses a two-parameter γ function to describe the shape of the HIV incidence curve.

Since more data have become available in the 1990s, the UNAIDS Reference Group has developed the Estimation and Projection Package (EPP), which uses a generic epidemiological model. The epidemiological model in EPP 2009 incorporates population change over time by fitting four input parameters: r, the rate of infection; t_{0}, the start year of the epidemic; f_{0}, the initial fraction of the adult population at risk of infection; and φ, a behaviour change parameter.

As the global HIV pandemic enters its fourth decade, countries have collected longer time series of antenatal surveillance data. With the current EPP model, it has been found that some patterns are hard to reproduce.

Here, we describe a flexible epidemiological model that can both fit the data well and yield realistic projections. In the Methods section we review the EPP model and describe the proposed alternative epidemiological model. In the Results section, we present results for 31 countries with generalised epidemics. In the Discussion section, we offer some conclusions.

The UNAIDS EPP, EPP 2011, is based on a simple susceptible–infected–removed epidemiological model.

The number of new adults entering the population at time t, E (t), depends on the population size of 15 years ago, the birth rate and the survival rate from birth to age 15. r(t) Is the average infection risk, µ is the non-AIDS death rate, −a_{50}(t) is the number of adults exit the model after attaining age 50 and M(t) is the number of net migration into the population. Because of the increasing coverage of antiretroviral therapy (ART), the infected group Y(t) is further decomposed according to the CD4 counts. As implemented in this manuscript, we divide Y(t) into three compartments: those at early-stage of infection, those eligible for the first line ART (eg, those having CD4 counts between 200 cells/mm and 350 cells/mm) and those eligible for the second line ART (eg, those having CD4 counts below 200 cells/mm). The survival rates of those eligible for ART also depend on whether they receive the treatment.

From EPP 2011 implementation experience, we find the Gaussian random walk model provides similar trends of r(t) across countries with generalised epidemics (see _{3}<0. β_{0} can be interpreted as an equilibrium condition at which the current infection rate does not lead to any shift of log(r_{t}). β_{1} Describes how log(r_{t}) changes when the current infection rate differs from its equilibrium value. For positive β_{0} and β_{1}, r(t) increases if its current value is less than β_{0}, and decreases otherwise. The mean shift is also related to the prevalence. β_{2} Is the expected change of log(r_{t}) given a unit increase of the prevalence and we expect β_{2}<0 so that the higher prevalence, the more likely the infection rate decreases. Since we have observed longer time series data, and for many countries their prevalence has stabilised, we want to restrict the change of r(t) for the later period of the epidemic. With β_{3}<0, the third term γ_{t}=(ρ_{t} _{+} _{1}−ρ_{t}) (t−t_{0}−t_{1})^{+}/ρ_{t} is the relative change of prevalence times the positive part of t−t_{0}−t_{1}, and it implies that the prevalence tends to stabilised after t_{0}+t_{1}. We refer to the above models (1) and (2) together as the r-trend model.

The r(t) trends and prevalence trends fitted by the Gaussian random walk model. Different colours represent the different posterior median prevalence from different countries. (A) r(t) Starts with a high value to initiate the epidemic and then declines; (B) the corresponding prevalence reaches the peak and then gradually declines; (C) and (D) r(t) has a turnover when the prevalence levels off or increases after a steady declining period.

The newly proposed r-trend model requires seven parameters. They are the starting year of the epidemic t_{0}, the number of years that the epidemic takes to stabilise t_{1}, the initial infection rate r_{0}, and four βs describing how the relative infection rate changes with prevalence, incidence and stabilisation stage. We carry out Bayesian estimation with the following prior distributions:

The lower bound of r_{0} for generalised epidemics is set at 1/11.5 because 11.5 is the expected length of the infectious period so the epidemic would not spread if r_{0} were smaller than 1/11.5. A lower bound of 1/11.5+1/d is recommended for concentrated epidemics, where d is the mean duration that people stay in the at-risk category.

The ANC data consist of the number of infected women, Y_{st}, and the number of women tested, N_{st}, for clinic s in year t. Let ρ_{t} be the overall population prevalence in year t, X_{st}=(Y_{st}+0.5)/(N_{st}+1). A hierarchical model is used to define the likelihood with a random clinic effect b_{s} accounting for the repeated measurement within clinic:

where Φ^{−1} is the standard normal cumulative distribution function, and ɛ_{st} are independent normal errors.

To evaluate the goodness of fit and predictive validity of the r-trend model, we fit models based on the full data time series as well as assessing 5-year out-of-sample projections from models fit to truncated data.

The most commonly available ANC data tend to be biased upwards because the pregnant women are more sexually active. Many countries with generalised HIV/AIDS epidemics also have a couple of national representative household-based Demographic and Health Surveys (DHS) that include HIV tests. DHS can serve as approximately unbiased estimates of HIV prevalence, and thus can be used to adjust the bias of ANC data. We can incorporate the DHS HIV prevalence, denoted by X_{dhs,t}, into the likelihood as follows:

We assume that the clinic effects b_{s} in

We evaluate the r-trend model using the data from urban and rural areas of the following 31 countries:

We fit the r-trend model to 62 datasets by using priors: β_{i}∼N(0, 0.2). All of the β_{i}s are significant under 0.05 level. Moreover, the signs of the coefficients are as expected. We get positive β_{0} and β_{1}, negative β_{2} and β_{3} for each individual dataset. The mean and SD of estimates from 62 datasets are shown in _{1} and log r_{0} for countries with generalised epidemics. The default prior distribution of t_{0} is still uniform (1970, 1990).

Summary of parameter estimates across 62 datasets

β_{0} | β_{1} | β_{2} | β_{3} | t_{0} | t_{1} | log r_{0} | |
---|---|---|---|---|---|---|---|

Mean | 0.46 | 0.17 | −0.68 | −0.038 | 1978 | 20 | 0.42 |

SD | 0.12 | 0.07 | 0.24 | 0.009 | 4.3 | 4.5 | 0.23 |

The proposed r-trend model has yielded satisfactory results on each dataset. _{3} parameter of the r-trend model assumes more stable prevalence that has been most commonly observed across countries. For Democratic Republic of the Congo, the classic EPP fits a straight line through the data period. The median prevalence of r-trend model better captures the quadratic curve of observed data (

Continued.

Results from Kenya, Uganda, Democratic Republic of the Congo, Namibia, Nigeria and Tanzania: coloured dots are observed prevalence from different sites; the black line is the classic model trajectory; the blue solid line is the median trajectory of the proposed model; the dashed blue lines are the 95% credible intervals of the proposed model; and the red solid line is the data trend averaged over all clinics at each year. Note that the infection rate of the classic model is a constant and hence not shown in the figure.

For each dataset, we calculate the coverage and the width of the 95% clinic-specific intervals, the MAE, the mean errors and the computing time.

Comparisons between the clinic-specific posterior median and the clinic data: coverage and width of 95% CI, mean absolute error (MAE) and mean error (ME).

Insample fit | Out-of-sample projection | |||
---|---|---|---|---|

Classic EPP | r-Trend model | Classic EPP | r-Trend model | |

Coverage | 86.7% | 87.7% | 76.5% | 80.1% |

Width | 0.071 | 0.070 | 0.097 | 0.077 |

MAE | 0.017 | 0.016 | 0.029 | 0.023 |

ME | 0.002 | 0.002 | 0.008 | −0.002 |

Computing time | 1.28 h | 1.76 h | 1.34 h | 1.50 h |

The results of insample fit are evaluated through the entire data period and the results of out-of-sample projection are evaluated in the 5-year projection period.

EPP, Estimation and Projection Package.

In

Last data year comparisons between the clinic-specific posterior median and the clinic data: coverage and width of 95% CI, mean absolute error (MAE) and mean error (ME).

Insample fit | Out-of-sample projection | |||
---|---|---|---|---|

Classic EPP | r-Trend model | Classic EPP | r-Trend model | |

Coverage | 87.4% | 89.4% | 74.9% | 79.0% |

Width | 0.067 | 0.065 | 0.108 | 0.080 |

MAE | 0.015 | 0.013 | 0.028 | 0.024 |

ME | 0.002 | 0.001 | 0.011 | −0.003 |

The results are evaluated only in the last data year.

EPP, Estimation and Projection Package.

Finally, the r-trend estimates of HIV prevalence in nine countries with multiple national population-based surveys are shown in

Estimates of prevalence from the r-trend models incorporating national population-based surveys: the blue solid line is the median trajectory of the proposed model; the dashed blue lines are the 95% credible intervals of the proposed model; the red solid line is the data trend averaged over all clinics at each year; and red dots are the estimates from national population-based surveys.

In the last decade, the classic EPP model fitted the data trends well for countries with generalised epidemics. However, as countries have obtained longer time series of data, a number of countries have proved challenging to fit using EPP. The classic EPP model imposes a strong structure of HIV prevalence trend: the epidemic spreads out, declines after a spike and then either levels off or keeps declining towards extinction. It is hard for the classic EPP curves to fit a second peak of prevalence after a steady decline of prevalence.

Here, we propose a new model in which the infection rate depends on the development of the epidemic and prevention systems. It offers greater flexibility than the classic EPP model, and it can also be parsimonious through careful variable selection. The new model proposed here combines the advantages of the previous models. It will retain the simplicity of EPP so that the parameters are easy to interpret and estimate. It will also add some flexibility to EPP to represent country-specific structure. An attractive feature of the proposed parsimonious model is that it allows imposing a hierarchical structure for areas within a country and for countries within a region, so that the area or country with fewer observations can borrow strength from its neighbours. We will present a more comprehensive analysis of the hierarchical model in another article.

Note that the results are based on illustrative HIV prevalence data for these countries, which may not be complete. These results should therefore not be seen as replacing or competing with official estimates regularly published by countries and UNAIDS.

Countries have obtained longer time series of HIV surveillance data in recent years. The patterns of HIV epidemics become more complex.

The four-parameter model in the UNAIDS Estimation and Projection Package does not have enough flexibility to capture some new patterns, for example, prevalence rises after a steady declining period.

A seven-parameter model is proposed in which the changes of infection rates are modelled parsimoniously. It yields more satisfactory results than the classic model.

This research was supported by NICHD grant HD054511, the National Center for Advancing Translational Sciences, grant UL1TR000127, and the Joint United Nations Programme on HIV/AIDS. The authors are grateful to Adrian Raftery, Xiaoyue Niu, Dan Hogan, Joshua Salomon, Tim Brown, Peter Ghys, Karen Stanecki, Juliana Daher and the four reviewers for helpful discussions and insightful suggestions.

LB contributed in developing the method, performing the analysis and writing the manuscript.

The Joint United Nations Programme on HIV/AIDS and NICHD.

None.

Commissioned; externally peer reviewed.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/